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CHAPTER  I 


INTRODUCT'ON 

In  analyzing  a  finite  state,  sequential  machine  the  designer  will  often  use 
a  flow-graph  or  flow-chart  to  describe  the  internal  characteristics  of  the  machine. 
From  these  characteristics  he  can  obtain  a  model  representing  the  external  per¬ 
formance  of  the  machine.  By  the  external  performance  of  the  machine  we  are 
referring  to  the  input,  output  characteristics;  i.e.,  for  a  given  set  of  input  signals 
what  is  the  output? 

For  simplicity  we  will  only  consider  machines  wUh  two  outputs,  a  1  or  a  0. 
Thus,  the  inputs  may  be  divided  into  two  classes,  those  which  produce  a  1  output 
(accepted  or  desired  Inputs)  and  the  remainder  which  produce  a  0  output  (the 
rejected  inputs).  The  regular  expression  provides  a  formal  method  for  represent¬ 
ing  all  of  the  possible  inputs  which  are  accepted. 

The  designer  is  often  faced  with  the  inverse  problem  of  trying  to  design  a 
machine  to  accept  the  set  of  desired  inputs  and  to  reject  all  others.  If  the  desired 
inputs  are  represented  as  a  regular  expression  the  designer  can  obtain  the  internal 
characteristics,  of  the  machine,  from  the  derivative  of  the  regular  expression. 

While  the  regular  expression  is  a  powerful  fool  its  use  has  been  limited  by 

the  overwhelming  amount  of  work  needed  to  obtain  the  sequential  machine.  This 

paper  describes  a  program  which  was  written  to  find  the  derr*af tec;  of  the  regular 

expression.  The  program  was  written  for  use  with  the  SCC  650  digital  computer 

of  the  Electrical  Engineering  Department,  Ohio  University.  This  program  is  called 
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REXPRO  for  Regular  Expression  PROcessor.  While  RE:XPRO  will  not  perform  all  of 
the  operations  needed  to  design  a  machine  It  does  do  the  11  dirty  work"  associated 
with  the  design. 

Chapter  II  defines  the  regular  expression  and  its  derivative  for  the  reader  who 
Is  net  familiar  with  this  technique.  Several  examples  using  the  regular  expression 
are  presented  In  this  chapter.  Chapter  III  introduces  the  concepts  of  string  pro¬ 
cessing;  a  method  by  which  the  computer  is  used  to  operate  on  non-numerical  data 
such  as  the  regular  expression.  Chapter  IV  introduces  the  methods  used  in  REXPRO 
for  identifying  the  form  of  the  regular  expression  and  which  rule  should  be  used  to 
form  the  derivative.  Chapter  V  presentsan outline  of  REXPRO  and  describes  its 
use. 
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CHAPTER  II 


THE  REGULAR  EXPRESSION  . 

The  finite  state,  sequential  machine  is  a  machine  with  a  finite  number  of 
states  that  the  machine  can  be  in,  and  the  machine  can  be  in  one  and  only  one 
state  at  any  given  time.  An  output  is  associated  with  each  of  the  states.  This  is 
the  description  of  the  “Moore**  machine.  This  type  of  machine  will  be  used 
exclusively  in  this  paper,  but  the  results  can  be  modified  to  include  other  forms. 

The  inputs  to  the  machine  consist  of  a  set  of  characters,  or  symbols,  which 
-are  Called  literals.  When  a  given  input  is  s^en  the  machine  changes  from  the 
present  state  to  some  new  state.  The  new  state  is  determined  by  the  present  state 
and  the  present  input.  This  is  shown  in  FigOre  II.  1. 

In  this  example  the  machine  is  initially  in  state  and  has  an  output  of  0. 
If  the  input  symbol  is  an  1  A*  then  the  machine  will  go  to  state  and  produce  a 
1  output.  If  a  1  B1  ^  is  seen  the  machine  will  go  to  state  and  produce  a  zero 
output. 

The  machine  shown  in  Figure  11,1  may  also  be  described  by  the  sequence, 
or  string,  of  characters  which  will  take  the  machine  from  a  starting  state  to  a 
state  which  will  produce  a  1  .  One  of  the  sequences  is  the  string  'A*  .  The 
symbol  1  B 1  wiil  take  the  machine  to  state  and  then  a  second  1  B 1  will  take  the 


The  literals  ‘A1  and  *  B1  are  not  to  be  confused  with  the  letter  of  the  alphabet. 
They  are  the  names  given  to  two  of  the  possible  input  symbols;  e.g. ,  they  may 
represent  the  polarity  of  a  voltage.  ’ 
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machine  from  stale  to  state  so  that  the  string  1  BB 1  is  another  sequence  which 
will  produce  a  1  output.  Continuing  we  find  the  strings:  ‘A*,  ‘BB1,  'BAA1, 
'BABBA*  ,  'BABBBA*  ,  etc. 

The  regular  set  consists  of  the  set  of  all  of  the  possible  input  sequences  vhich 
will  produce  a  1  output.  For  this  example  we  obtain, 

S  =  {A,  BB,  BAA,  BABA,  BABBA,  BABBBA,  etc.} 

The  regular  expression  is  a  finite  function  which  represents  all  of  t lie 
sequences  contained  in  the  regular  set.  The  regular  expression  is  formed  by  recur¬ 
sively  using  the  Boolean  operators  (AND,  OR,  and  NOT),  the  concatenation 
operator,  and  the  star  operator.  For  this  example  the  regular  expression, 

R  =  A+BB+BAB*A 

2 

is  obtained,  fhe  star  operator  is  defined  as, 

A*  =  X+A+AA+AAA+ . + 

where  the  lambda  (X)  represents  the  null  string,  or  string  of  zero  length.  Another 
character  which  may  appear  in  the  regular  expression  is  "0",  representing  the  null 
set. 


Throughout  this  paper  the  AND  operator  is  denoted  by  the  11 .  "  ,  the  OR  by  the 
+  the  concatenation  operator  by  juxtaposition,  and  the  NOT  operator  by 
brackets;  e.g. ,  [A]  =  ~  (A). 
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Figure  II.  I  An  Example  of  a  Finife  State  Machine 


Figure  II.  2  The  Use  of  X  and  0 
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Figure  II,  2a  shows  the  meaning  of  the  lambda.  In  this  example  the  initial 
state  already  has  the  desired  output  (1)  so  a  sequence  of  zero  length  is  needed  to 
go  from  the  initial  state  to  the  desired  state.  In  Figure  II. 2b  there  is  no  way  to 
go  from  the  initial  state  to  a  state  which  will  produce  a  1  output.  Thus,  the 
regular  expression,  R  =  0. 

The  Derivative  Concept 

For  the  rest  of  this  paper  we  are  going  to  be  concerned  with  the  inverse 
problem;  or  the  design  problem.  In  this  class  of  problems  the  designer  knows  the 
sequence  of  characters  that  is  required  to  produce  a  1  output. 

To  design  a  machine  to  accept  this  sequence  the  concept  of  the  derivative 
of  a  regular  expression  is  introduced.  The  rules  for  forming  the  derivative  are 
presented  below.  The  derivation  of  these  rules  can  be  found  in  [2]  . 

11. 1  D  a  -  X 

a 

11.2  DQb  =  0,  for  b  =  0t  b  =  X,  b  a  literal  ^  a 

11.3  Da(X*)  =  Dq(X)X* 

-1-4  Dq(XY)  =  Da(X)Y+n(X)  DQ(Y) 

H.5  Da(f(X,Y))  =  f(Da(X),  Da(Y)) 

11.6  (X)  =  Db(Da(X)) 

Where  a  and  b  are  literals,  X  and  Y  are  regular  expressions  or  the  result  of 
taking  the  derivative,  and  f  is  any  Boolean  function.  The  eta  function  is  defined 
as, 

11.7  n(X)  =  J\  if  X  e  X 

[0if  X  t  X 
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n.8  n(XY)  =  n(X) .  n(Y) 

11.9  n(f(x#Y))  =  f(n(x),  n(Y) ) 

It  can  be  shown  that  any  derivative  can  be  found  by  repeatedly  applying  the  above 
rules. 

To  simplify  the  expression  that  is  formed  by  using  the  derivative  operation 
the  following  identities  are  presented.  Additional  identities  can  be  found  in  [  3  )  . 

11.10  0X  =  X0  =  0 

11.11  0+  X  =  X  +  0  =  X 

11.12  XX  «  XX  =  X 

11.13  0  *  X  -  X*0  =  0 

The  process  of  obtaining  the  machine  from  the  regular  expression  is  a  simple, 
but  tedious  operation.  First,  associate  the  regular  expression  with  an  initial  state 
(q^).  Second,  take  the  derivative  of  the  regular  expression  with  respect  to  each 
of  the  literals  and  assign  a  state  to  each  one  of  these  derivatives.  Eachof  these  new 
states  are  connected  to  the  initial  state  via  a  line  directed  from  the  initial  state 
to  the  new  state.  Each  of  these  lines  is  given  a  value  corresponding  to  the  literal 
which  produced  the  derivative  that  was  assigned  to  that  state.  Third,  take  the 
derivative  of  each  of  the  expressions  found  in  step  two.  This  process  is  repeated 
until  no  new  expressions  are  formed.  Fourth,  the  expression  assigned  to  each  of 
the  states  of  the  machine  Is  tested  to  see  if  it  contains  lambda  (this  is  the  same  as 
the  eta  function  mentioned  above).  Then  a  1  output  is  assigned  to  each  state 
for  which  its  corresponding  expression  contains  lambda.  A  0  output  is  assigned 
to  all  other  states. 
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For  the  example  shown  In  Figure  11.1  the  following  expressions  are  obtained. 

11.14  R  =  A  *  BB  +  BAB*A 

11.15  D/A/ (R)  =  X  ,  contains  lambda 

11.16  D/8/  (R)  «  B  +  AB*A 

11.17  D/BA/(R)  -  B*A 

11.18  D/BB/  (R)  =  X,  contains  lambda 

11.19  D/F AA/  (R'  =  X,  contains  lambda 

11.20  D/bAB/  (R)  *  B*A  -  D/bA/  (R) 

In  step  1  the  regular  expression  Is  associated  with  the  initial  state  .  In 
the  second  step  the  derivative  with  respect  to  A  was  formed.  As  this  is  a  new 
expression  it  was  associated  with  state  q^.  Also  this  expression  contains  lambda 
so  this  state  has  an  output  of  1 .  The  literal  A  was  used  to  generate  this  expression 
so  a  transition  from  q^  to  q£  occurs  for  the  input  A*  Repeating  the  process  state 
is  generated  and  it  is  connected  to  by  the  symbol  B* 

In  step  four  the  derivative,  D/BA/  (R)  =  B*A,  is  formed.  As  this  is  a  new 
expression  it  i$  assigned  to  state  q^.  By  rule  11.6  this  derivative  can  be  formed 
by,  D/1BA/(R)  *  D/A/  (D/V  (R)  )  =  D/A/(B+AB*A)  =  B*A.  As  the  ex¬ 
pression  assigned  to  state  q^  is  formed  by  taking  the  derivative  (with  respect  to  A) 
of  the  expression  assigned  to  state  q^  we  conclude  that  a  transition  occurs  from 
q3  to  for  the  Input  symbol  A. 

°  lr.  keeping  with  the  notation  used  with  REXPRO  the  slash  will  be  used  to  indi¬ 
cate  the  name  of  the  derivative  instead  of  using  subscripts;  i.e.,  D/A/  (R)  = 

D„(R). 
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The  process  is  continued  in  steps  5 ,  6,  and  7.  The  expressions  formed  in 
each  of  these  sfepsare  the  same  as  ones  formed  in  the  previous  steps.  In  this  case 
the  expression  is  assigned  to  the  same  state  as  it  was  assigned  to  in  the  earlier 
step.  For  example,  the  expression  formed  in  step  5  is  assigned  to  state  q  ^  and  a 
line  is  drawn  to  connect  to  q^  to  indicate  the  transition  for  the  input  B, 

The  expression  assigned  to  state  qj  contains  lambda  so  this  state  is  assigned 
an  output  of  1.  None  of  the  expressions  associated  with  the  remaining  states 
contain  lambda,  so  they  are  assigned  output  of  0. 

Via  this  set  of  operations  it  has  been  possible  to  regenerate  the  flow-graph 
of  Figure  11.1.  For  the  interested  reader  a  more  detailed  discussion  of  the 
derivative  of  the  regular  expression  is  given  in  [  1  ]  and  12],  In  the  following 
section  several  practical,  but  simplified,  problems  will  be  studied.  In  these 
problems  the  machine  is  not  specified  beforehand,  but  the  regular  expression  is 
used  to  design  a  machine  to  meet  the  given  problem. 

Examples  Using  the  Regular  Expression 

Problem  I  .  In  a  small,  general  purpose  computer,  such  as  the  SCC  650, 
the  instruction  repertoire  consists  of  three  classes,  the  nonmemory  instructions, 
the  memory  reference  instructions,  and  the  indirect  memory  instructions.  The 
nonmemory  instructions  are  those  which  do  not  refer  to  data  stored  in  the  memory, 
while  the  memory  reference  instructions  decode  part  of  the  instruction  to  find  the 
address  where  the  data  is  stored.  The  indirect  instructions  first  finds  the  data,  as 
for  the  memory  reference  instructions,  and  then  interprets  this  data  as  the  address 
where  the  data  is  stored  in  the  memory. 
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A  sub-section  of  the  control  system  Is  needed  to  control  whether  the  memory 
address  Is  to  be  obtained  from  the  PC  register  or  the  LC  register*  The  LC,  location 
counter,  register  contains  the  address  of  the  data^  and  the  PC,  program  counter, 
gives  the  address  for  the  next  Instruction.  As  part  of  the  computer  there  is  an 
Instruction  decoder  which  will  indicate  which  clou  the  instruction  is  in.  The 
classes  will  bo  encoded  as  N,  M,  or  I  for  the  nonmemory,  the  memory,  and  the 
indirect  instructions,  respectively.  There  is  also  a  memory  control  system  which 
indicates  when  the  memory  has  finished  its  read  (and  write)  operation.  This 
completion  will  be  assigned  the  character  F. 

Thus  the  machine  which  will  control  the  use  of  the  registers  has  input 
sequences  of  N,  or  MF,  or  IFF.  Or  in  terms  of  the  regular  expression,  R  - 
N  +  MF  +  IFF.  This  is  not  entirely  correct  as  the  machine  must  be  able  to  accept 
an  arbitrarily  large  number  of  these  sequences.  This  problem  is  solved  by  the  use 
of  the  star  operator. 

Thus  the  exor^ssion. 


R  =  (N  t  MF  +  IFF)* 

is  obtained.  In  Figure  11.3  the  derivatives  of  this  expression  are  listed  as  they 
were  obtained  from  REXPRO^, 


The  problem  of  how,  and  when  the  registers  are  set  will  not  be  considered  here. 
5 

The  computer  uses  the  symbol,  \  r  to  indicate  the  symbol  X  and  the  , is  used 
to  indicate  the  symbol  0. 
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R-FXP  PK0GKAM- -HEADY 
K= (N+MF* IFF)*# 
EXECUTE 

CHECK  C0PY . . 

D<  CN  +  MF+ IFF)*) 

D/N/ 


D/N/  = 

<N+MF+IFF>* 

CONTAINS  \ 
D/M/ 

D/M/= 

F  <N*MF+ I FF) * 
D/F/ 


D/F/  = 


D/I/ 


D/l/  = 

FF (N+MF+IFF)* 
D/MN/ 

D/MN/= 

D/MM/ 

D/MM/= 

D/MF/ 

D/MF/= 

<  N  +  MF+ IFF)* 

CONTAINS  \ 
D/MI/ 

D/M  I /  = 


Figure  11,3  Derivatives  of  R 


(N  +  NF  +  IFF)* 
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D/IN/ 


D/IN/= 

D/IM/ 

D/IM/- 

D/IF/ 

D/IF/= 

FCN+MF+ I F  F  >  * 
D/ll/ 

D/I I/= 

D/IFN/ 

D/IFN/= 

D/IFM/ 

D/IFM/= 

D/IFF/ 

D/I FF/= 
(N+MF+IFF)* 

CONTAINS  \ 
D/IFI/ 

D/IFI/= 


Figure  11.3 


Continued 
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Figure  11.4  Flow-graph  for  Memory  Control  System 
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Figure  11,4  shows  the  flow-graph  that  was  obtained  from  the  derivatives.  The 
state  with  a  1  output  Is  the  one  for  which  the  PC  register  is  used.  The  LC 
register  wlli  be  used  with  the  rest  of  the  states.  State  q^  is  entered  when  illegal 
sequences  cf  symbols  are  found.  This  may  be  used  for  error  checking. 

Problem  2.  The  reader  should  be  aware  of  the  fact  that  the  finite  state, 
sequential  machine  is  not  restricted  to  hardware  designs,  but  that  the  execution  of 
a  computer  program  can  be  studied  as  a  sequential  machine. 

In  computing  the  eta  function  it  is  necessary  to  test  a  sub-sequence  of  the 
regular  expression  to  see  if  it  contains  lambda^.  The  possible  input  characters  in 
this  sub-sequence  are  the  literals,  which  will  be  denoted  by  the  symbol  A;  the  phi , 
denoted  by  P;  the  lambda,  denoted  by  L;  the  star  operator,  denoted  by  S;  and  any 
other  operators,  denoted  by  0, 

For  the  sequence  to  contain  lambda  it  must  contain  only  terms  of  the  form 
AS,  L,  LS,  or  PS  and  an  arbitrarily  large  number,  but  at  least  one,  of  these  terms. 
Thus,  the  expression, 


(AS  +  L  +  LS  +  PS)  (AS  +  L  +  LS  +  PS)* 


is  generated. 

An  additional  requirement  is  that  the  sequence  be  terminated  by  an  operator. 

Thus, 

(AS  +  L  +  LS  +  PS)  (AS  +  L  +  LS  +  PS)*0 


This  is  part  of  the  function  of  the  subroutine  ^BETAT  described  in  Chapter  IV 
and  Appendix  D. 
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To  this  expression  the  NULL  character,  denoted  by  N,  is  added  for  complete¬ 
ness.  The  NULL  character  has  no  intrinsic  value,  but  is  a  blank,  or  spacing, 
character  which  Is  placed  in  the  sequence  to  fill  in  any  unused  locations  in  the 
sequence.  As  an  arbritrary  number  of  NULL's,  including  zero,  may  appear  in  the 
sequence  the  NULL  will  be  placed  in  the  sequence  as  N*.  These  NULL's  can  be 
placed  in  the  sequence  between  any  and  all  of  the  other  characters,  thus  the 
regular  expression  given  below  is  obtained. 

R  =  (N*AN*SN*  +  N*LN*  +  N*LN*SN*  +  N*PN*SN*) 
(N*AN*SN*  +  N*LN*+N*LN*SN*  +  N*PN*SN*)*N*0 
An  equivalent,  but  simpler,  form  for  the  regular  expression  is, 

R  =  N*Z(N*Z)*N*0,  where 
Z  =  (AN*S  +  L+  LN*S  +  PN*S) 


Figure  11.5  shows  the  derivatives  which  were  calculated  for  this  problem. 

Figure  11.6  shows  the  resulting  flow-graph.  After  a  sequence  has  been  applied  to 

this  machine  the  final  state  will  either  be  q  or  q  .  State  q  has  a  one  output 

5  7  5 

indicating  that  the  sequence  contained  lambda;  state  produces  a  zero  output 

for  those  sequences  which  do  not  contain  lambda. 

It  is  now  a  simple  matter  for  the  programmer  to  translate  the  flow-graph  in 

Figure  11.6  to  a  program  which  will  calculate  the  eta  function.  Each  of  the 

states,  except  the  terminal  states  q^  and  q  ,  is  translated  into  a  subprogram  which 
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R-EXP  PR0 GRAM --READY 
R  =  N+Z  <N*Z  >*N*0« 

Z=AN*5+L+LN*S+PN*Se 

EXECUTE 

CHECK  C0P Y  .  . 

D<N*CAN*S  +  L+LN*S+PN*S>  <N*  (AN*S+L+LN*S  +PN*S)  )*N*0  ) 
D/N/ 

D/N/  = 

<N  >*CAN*S+L+LN*S  +  PN*S>  <N*(AN*S+L+LN*S+PN*S>  )*N*0 
D/A/ 

D/A/= 

<N)*S<N*<AN*S+L+LN*S+PN*S) )*N*0 
D/S/ 

i 

D/S/  = 

D/L/ 

D/L/~ 

(\  +  (N)  *S  )  <N*(AN*S  +  L+LN*-S+PN*S>  )*N*0 
D/P/ 

D/P/= 

<N)*S<N*<AN*S+L+LN*S*PN*S)  >*N*0 
D/0/ 

D/0/  = 

D/AN/ 

i 

D/AN/= 

(N>*S(N*<AN*S+L+LN*S+PN*S) >*N*0 
D/AS/ 

D/AS/= 

( N* ( AN*S+L+LN*S+PN*S) )*N*0 
D/AA/ 

D/AA/= 


Figure  11.5  Derivatives  of  R  -  N*Z  (N*Z)*N*0 
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D/AL/ 


D/AL/s 
D/A  P/ 

i  . 

D/AP/= 

~  i 

D/A0/ 

D/A0/= 

D/ASN/ 

D/ASN/= 

<  (N>*<AN*S+L+LN*S+PN*S) <N* < AN*S+L+LN*S+PN*S )>*N*0 + (N  )  *0  ) 
D/ASA/ 

D/ASA/= 

(tf)*S<N*(AN*S+L+LN*S+PN*S> )*N*0 
D/ASS/  ' 

I?/ASS/  = 

D/ASL/ 

D/ASI-/5 

<\+<N)*S) <N*<AN*S+L+LN+S+PN*S) >*N*0 
D/ASP/  '  1 

D/ASP/= 

<N  )*S (N* <AN*S+L+LN*S+PN*S  >  >*N*0 
D/AS0/ 

D/AS0/=  ' 

\ 

C0NTAINS  \ 

D/ASNN/ 

D/ASNN/= 

t  <N)*(AN*S  +  L+LN*S+PN*S) <N* (AN*S+L  +  LN*S  +  PN*S  ) ) *N*0  + (N  > *0  > 

i  t 


Figure  11,5  Continued 
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1 


d/asna/ 


d/asna/= 

<N)*S(N+fAN*S+L  +  LN*S  +  P|\|>t<S>  )*N*0 
D/ASNS/ 

D/ASNS/= 

D/A5NL/ 

D/ASNL/= 

(V+ (N ) +  S  )  <N*(AN*S  +  L  +  LN*S  +  PN*S) )*N  +  0 
d/asnp/ 

D/ASNP/= 

(N>*SCN*  (AN*S  +  L+LN*S  +  PN*S )  >*N*0 
D/ASN0/ 

D/ASN0/= 

\ 


C0NTAINS  \ 
D/LN/ 


D/LN/= 

VAN*S  +  L  +  LN*S  +  FN*S)  j*n*0+<  <N>*  C  AN*S+L+LN*.S+Pn*5) 
<N*( AN*S+L+LN*S+PN*S) ) *N*0+ <N ) *0 ) ) 

D/LA/ 


D/LA/= 

(N>*S(N*(AN*S+L+LN*S+PN*S>  >*N*0 
D/LS/ 

D/LS/= 

(N*(AN*S+L+LN*S+PN*S> )*N*0 
D/LL/ 


D/LL/= 

(\+ <N )*S ) (N* (AN* 
D/LP/ 


S  +  L+LN*S  +  PN*S ) )*N*0 


D/LP /= 

(N)*S<N'*(AN*S  +  L  +  LN#S  +  PN*S))*N>),0 


Figure  11,5  Continued 
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D/LO/ 


D/LU/- 

\ 

CONTAINS  S 
D/LNN/ 

D/LNN/= 

<  <N>*S(N*  <AN  +  S  +  L  +  LN*S  +  PN*S  >  >*N*0+C<N)*(AN*S  +  L+LN*S-*-.  N*S> 
<N*CAN*S*L+LN*S+PN*S>  )  *N*0  +  (N  )  *0  )  > 

D/LNA/ 

D/LNA/= 

(N)*S(N*(AN>t‘S  +  L+LN’*'S+PN*S)  >*N*0 
D/LNS/ 

D/LNS/= 

CN*CAN*S+L+LN*S+PN*S> >*N*0 
D/LNL/ 

D/LNL/= 

<\+(N)*S) (N* ( AN  +  S  +  L  +LN*S  +  PN*S ) >*N*0 
D/LNP/ 

D/LNP/= 

<N)*S(N*(AN*S+L+LN*S+PN*S)  >*N*0 
D/LN0/ 

D/LN0/= 

\ 

CONTAINS  \ 


Figure  11.5  Continued 
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will  read  the  next  character  in  the  sequence.  After  this  character  is  read  it  is 
tested  by  the  subprogram  to  determine  which  subprogram  the  program  will  transfer 
to.  The  transfers  from  subprogram-tcrsubprogram  are  formed  directly  from  the  state- 
to-state  transition  as  shown  on  the  flow-graph.  The  terminal  states  (q^  and  q_,)  are 
translated  into  subprograms  which  display  the  results  of  the  test  (e.g, ,  a  value 
Is  returned  to  a  calling  program  or  an  appropriate  message  is  printed). 
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CHAPTER  III 


STRING  PROCESSING 

The  preceding  chapters  discussed  the  use  of  the  regular  expression  and  how 
the  derivative  can  be  formed  by  using  manual  manipulations  of  the  regular  expres¬ 
sion.  The  problem  is  to  write  a  program  to  perform  these  manipulations.  The  digital 
computer  is  designed  to  handle  numerical  formulas  to  which  a  numerical  answer  can 
be  assigned.  The  problem  X  -  A  +  B(A+  C),  where  values  have  been  assigned 
to  A,  B,  and  C ,  can  be  easily  solved  on  the  computer  by  means  of  adding  and 
multiplying  the  values  of  A ,  B,  and  C.  The  results  of  these  operations  are  then 
placed  in  the  location  reserved  for  the  answer,  X. 

In  the  case  of  the  derivative  of  the  regular  expression  we  have  a  string  of 
characters  which  represents  the  regular  expression;  in  general  this  string  is  of 
unknown  length.  Before  studying  the  process  by  which  the  computer  fcrms  the 
derivative  the  concept  of  string  processing  is  introduced.  String  processing 
provides  a  method  by  which  the  computer  can  perform  the  basic  manipulations  on 
the  string  (e.g.,  storing,  adding,  or  removing  characters  from  a  string)  needed  to 
form  the  derivative. 

A  string  is  defined  as  a  series  of  characters  that  are  written  in  a  linear  form 
(e.g.,  this  sentence  is  a  string).  The  symbol  will  be  used  to  represent  the 
string.  For  this  report  only  strings  that  are  used  to  represent  the  regular  expression 
will  be  considered.  These  are  strings  which  contain  only  the  following  characters: 
the  literals,  0,  X,  +,  . ,  *,  (, ),  [,1,0,  and  R.  These  characters  form  the 

vocabulary  of  the  string.  Also  the  requirement  of  validity  will  be  implied. 
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A  valid  siring  is  on©  in  which  the  operators  are  used  properly.  This  implies 
that  the  parenthesis  and  brackets  are  properly  formed  and  that  the  operators  are 
used  only  in  a  valid  context.  A  note  should  be  made  that  the  validity  of  a  string 
is  determined  only  by  its  context  and  not  its  content. 

To  aid  in  the  discussion  of  the  use  of  strings  we  introduce  the  terms:  alpha 
character,  operator,  term,  and  level.  The  alpha  character  consists  of  the  literals 
plus  X  and  0 ,  The  remaining  characters  (i.e.,  the  +#  •**#(#)#[,!,  and  D) 
are  the  operators  and  delimiters, 

A  term  is  a  logical  unit  of  the  string.  It  is  a  sub-string  consisting  of  alpha 
characters  and  the  star  operator,  or  any  sub-string  which  is  contained  within  a  set 
of  parenthesis  or  brackets.  If  a  term  is  followed  by  the  star,  the  star  is  considered 
part  of  the  term.  The  level  provides  a  method  of  ranking  the  terms  in  a  string. 
Terms  that  are  connected  by  one  of  the  operations  of  AND,  OR,  or  concatenation 
are  all  of  the  same  level.  When  several  terms  are  grouped,  via  a  set  of  paren¬ 
thesis  (or  brackets),  a  new  term  is  formed.  The  new  term  is  of  a  higher  lev^l  than 
the  level  of  the  individual  terms.  The  phrase  "to  go  down  In  level"  means  that, 
rather  than  testing  the  large  term,  the  several  terms  that  are  enclosed  in  a  set  of 
parenthesis  are  to  be  tested. 

The  subroutine  ^SLVL  has  been  written  for  REXPRO  to  perform  the  operation 
of  reading  a  term.  It  returns  with  the  first  and  last  characters  of  the  term,  ^SLVL 
also  indicates  if  the  term  is  followed  by  more  terms  of  the  same  level, 

A  method  is  needed  to  store  and  operate  on  the  strings  in  the  computer. 

This  is  accomplished  by  a  set  of  linked  cells,  A  cell  is  simply  a  location  which 
can  hold  one,  and  only  one,  character  of  the  string. 


To  indicate  how  the  Individual  ceils  are  joined,  to  form  the  string,  a  linking 
system  is  used.  The  simplest  linking  system  consists  of  linearly  ordering  the  cells; 
i.e.,  the  first  cell  is  used  to  store  the  first  character  of  the  string,  the  second  cell 
stores  the  second  character  of  the  string,  etc* 

In  REXPRO  the  number  of  literals  was  restricted  to  16  and  two  additional 
characters  were  added.  These  are  the  end  flag  (represented  by  the  symbol  nCa  " ) 
which  is  used  to  indicate  the  end  of  a  string  and  the  NULL  character.  Thus  the 
vocabulary  of  REXPRO  contains  less  than  32  separate  characters.  These  characters 
can  be  encoded  by  a  five  bit  binary  word.  Appendix  A  contains  a  list  of  the  vo¬ 
cabulary  and  the  codes  used  to  store  these  characters. 

The  memory  system  of  the  SCC  650  consists  of  a  main  core  with  4096,  12  bit 
words.  For  maximum  efficiency,  with  respect  to  memory,  two  cells  are  stored  in 
each  memory  word.  To  address  a  given  cell  a  two-word  addressing  scheme  is  used. 
The  MP  (memory  point)  gives  the  address  of  the  memory  word  and  the  HP  (half 
point)  indicates  which  cell  in  the  work  is  being  addressed.  A  HP  equal  to  zero  is 
used  for  the  first  cell  and  a  HP  equal  to  one  is  used  to  reference  the  second  cell, 

A  prefix  is  used  to  denote  the  different  locations  in  the  string.  These  are 
LE,  for  level  end;  L,  for  level  start;  LR,  for  last  read;  R,  for  read;  T  and  T2,  for 
temporary;  and  W,  for  write. 

To  finish  the  description  of  string  storage  we  must  consider  what  happens 
when  characters  are  added  or  removed  from  a  string.  This  can  be  accomplished 
by  shifting  the  characters  in  the  string;  however,  this  introduces  the  problem  of 

having  the  end  of  the  string  blocked  by  another  string. 
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The  problem  of  removing  character  is  considered  first.  If  part  of  the  string 
is  *  D  (  (WXYZ )  )*  we  see  that  the  string  contains  an  extra  set  of  parentheses  which 
should  be  removed.  This  string  is  shown  in  Figure  III.  la  as  it  would  be  stored  In 
memory.  Each  horizontal  block  represents  a  memory  word  containing  two  cells. 

The  number  to  the  left  gives  the  MP  location  of  that  word. 

1  B 

In  the  expansion  of  the  string  the  inner  set  of  parentheses  at  *5001/0  (the 

MP/HP)  and  at  *5003/1  are  to  be  removed.  When  these  are  removed  something 
must  be  placed  in  these  cells  so  the  string  remains  continuous.  The  new  character 
is  the  NULL  character.  The  NULL  character  has  no  value,  but  is  simply  a 
character  used  to  fill  out  the  string.  Figure  III. lb  shows  the  string  after  the  ex- 
pension  with  the  symbol  N  being  used  to  represent  the  NULL  character. 

The  jump-flag  is  introduced  for  those  cases  where  it  is  desired  to  add 
characters  to  the  string.  The  jump-flag  (or  more  simply  the  flag)  is  represented  by 
placing  a  one  in  the  sixth  bit  of  the  second  cell  in  she  memory  word.  In  the 
examples  it  is  indicated  by  using  an  Fin  the  cell.  Placing  the  flag  in  a  cell  indi¬ 
cates  that  the  following  word  does  not  contain  the  remaining  part  of  the  string,  but 
rather  an  address  giving  the  location  where  the  string  is  continued. 

Figure  III. 2a  shows  the  string  *D  (ABC)(<>  *  as  it  appears  in  the  memory. 

For  this  example  it  is  desired  to  replace  the  character  ' B 1  with  the  string  ‘WXYZ*  . 
This  is  accomplished  by  removing  the  character  *B*  and  replacing  it  with  *F-W* 

7  “ 

Any  change  In  the  string  is  called  “expanding  the  string'*  . 

8 

The  apostrophe  is  used  to  represent  number  to  the  base  eight,  e.g.,  *5001  - 

Swig  =  2561  io* 
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Figure  III.  1  The  Use  of  the  NULL  Character. 
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(i.e.,  a  flag  and  the  symbol  W)  and  saving  the  following  two  characters;  the 
'C)'  .  These  characters  are  replaced  with  the  address  *5500  where  the  string  is 
continued . 

At  location  '5500/0  the  remaining  characters  of  the  string  are  added  along 
with  the  two  characters,  the  1  C)‘  ,  which  are  removed  from  the  main  string.  At 
location  ‘5502/1  a  flag  is  placed  in  the  string.  The  address  in  the  following 
location  refers  back  to  the  remaining  portion  of  the  string.  The  resulting  storage 
in  the  memory  is  shown  in  Figure  IIL2b.  This  string  is  read  as  'D(AWXYZC)^  1  . 

Several  comments  should  be  made  about  this  example.  First,  as  there  is  no 
space  left  in  this  method  to  store  information  about  the  half-point  the  requirement 
is  added  that  all  strings,  and  string  segments,  have  to  start  with  a  half-point  of 
zero.  Second,  a  flag  may  be  used  only  in  the  second  cell;  thus,  we  see  the 
use  of  the  1  F-N1  (flag-NULL)  at  location  1  5502/1 . 

This  process  of  removing  a  character  in  the  string  and  adding  an  address  is 
termed  "setting  a  breakpoint"  .  This  is  handled  by  the  REXPRO  subroutine 
^BPSET  which  not  only  adds  the  flag  and  jump  address,  but  it  also  saves  the 
characters  that  were  removed  from  the  string  to  give  space  for  the  address.  The 
process  of  adding  the  jump  at  the  end  of  the  new  string  segment  is  termed  "return¬ 
ing  the  break-point"  and  is  executed  by  the  subroutine  /RTBP.  This  routine  also 
adds  the  characters  which  were  removed  from  the  string  by  /BPSET. 

The  subroutines  JfaEAD  (string  read)  and  ?V/RITE  (string  write)  are  two 

additional  routines  which  are  used  in  processing  the  string.  These  four  routines 

form  the  bases  for  all  of  the  string  operations  performed  by  REXPRO.  A  description 
of  the  subroutines  J&PSET,  $RTBP,  and  J&LVL  is  given  in  Appendix  D. 
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Figure  III. 2  An  Example  of  the  Use  of  the  Jump-Flag. 
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CHAPTER  IV 


USING  THE  COMPUTER  TO  FORM  THE  DERIVATIVES 


The  previous  chapters  introduced  the  derivative  of  the  regular  expression 
and  the  concept  of  string  processing.  We  are  now  ready  to  study  the  processes  by 
which  l he  computer  forms  the  derivative.  The  computer  operation  will  be  studied 
by  following  an  example  through  the  steps  needed  to  form  the  derivative. 

The  strategy  used  in  forming  the  derivative  is  to  repeatedly  apply  the 
following  rules  until  all  of  the  terms  are  of  the  form  given  in  Rule  IV. 7b.  These 
rules  are, 

IV.  1  D  ($)*)  =  D  ((£))  (X)* 

IV.  2a  DCfjjy  =  Dtfpji^if  nify  =  0 

iv.  2b  ifn^)  =  * 

iv. 3  -  (Dpy  +  Dpy) 

IV. 4  D  ([*]  )  =  [DC?)] 

IV. 5  * 

IV. 6  D  ((50)  =  D  (3) 

IV. 7a  D  (A/A  B  -B  )  =  D  ((A  )*A  B - B  ), 

12  1  n  12  1  n 


where  A  ,  A^  are  literals  and  B^ 
operator. 


.B  are  literals  or  the  star 
n 


IV. 7b  D  (A  A_B - B  )  ,  where  A  and  B  are  as  defined  above. 

1  2  l  n 


IV. 8  D  (^  .  50,)  =(D^)  .  D  (fy) 

*  The  first  eight  rules  are  numbered  according  to  the  subroutines  which  use  these 
rules.  Due  to  a  reorganization  of  the  subroutines  there  is  no  rule  IV. 5. 
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After  all  of  the  terms  are  of  the  type  given  in  Rule  IV.  7b  the  following  rules 

i 

are  used  as  the  final  step  in  forming  the  derivative* 

IV. 9a  D/A/(AjA2 — An)  =  X(A2 — An)/ifA  =  AJ 

IV. 9b  D/A/ (A.  A - A)  =  0,  if  A  /A, 

l  l  n  I 

We  should  note  that  these  rules  differ  slightly  from  those  given  5n  Chapter  II,  but 
they  are  derived  from  the  rules  in  Chapter  II.  These  rules  were  modified  so  that 
they  would  be  compatible  with  the  string  processing  used  In  REXPRO. 

Expanding  The  Derivative 

We  will  now  consider  the  example  D/A/  (R),.  where  R  =  (A*B  +  C)*AB*. 
This  would  be  stored  in  the  computer  as, 

'D  ((A*B  +  C)*AB*)&  1 

and  the  11  name  of  the  derivative 11  is  *A'  .  It  should  be  noted  that  the  derivative 
name  is  not  stored  as  part  of  the  derivative;  e.g.,  as  a  subscript.  Rules  IV.) 
through  IV.8  are  not  a  function  of  the  particular  derivative  being  formed;  thus, 
the  name  can  be  separated  from  the  derivative  and  retrieved  only  where  it  is 
needed. 

It  is  also  noted  that  the  derivative  is  stored  with  1 D  ('  and  1 )  *  added  to 
the  string.  This  provides  easily  detected  delimiters  to  outline  the  derivative.  A 
search  Is  made  of  the  term(s)  following  the  1  D  (*  and  a;  list  of  the  characteristics  of 
these  terms  is  made.  For  this  example  we  find  the  two  terms  *(A*B+  C)*1  and 
1  AB*1  .  Another  check  is  made  and  we  find  that  the  terms  are  connected  by 
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I 


juxtaposition,  i.e.,  the  forms  or©  concatenated  and  Rule  IV. 2  applies*  To  deter- 

i 

mine  if  Rule  IV. 2a  or  IV*2b  is  to  be  used  the  eta  function  mu^t  b©  evaluated.  The 
method  of  evaluating  eta  will  be  deferred  until  the  end  of  the  chapter,  but  it  is 
found  that  x]  (  (A*B+  C)*)  -  X.  Thus  the  expression, 

.  1  (D  (  (A*B  +  C)*)AB*  +  D(AB*)  )(a  1  , 

is  formed  by  applying  Rule  IV. 2b. 

It  is  pointed  out  that  the  expansion  of  the  string,  produced  by  Rule  IV,  2h# 

i 

is  formed  by  using  the  techniques  presented  in  Chapter  III;  i.e.,  adding  and  re- 

i 

moving  characters!  from  the  string  and  copying  portions  of  the  string.  The  exact 

i 

methods  that  aroused  in  the  expansion  process  are  reserved  for  the  description  of  the 

i 

subroutine  ^DF2  in  Appendix  D. 

The  expression  is  still  not  of  the  form  given  in  Rule  IV.  7b,  thus  the  process 
is  repeated.  The  string  is  scanned  from  left  to  right  until  the  first  delimiter,  the 
*D(I  ,  is  found.  The  following  operand,  1  (A*B  +  C)*,  h  then  read.  This  operand 

I  •  i 

consists  of  a  single  term  and  that  term  is  operated  on  by  the  star.  Thus  Rule  IV.  1 
is  used  to  produce  the  expanded  string, 

1  (D  (  (A*B  +  C)  )  (A*B  +  C)*AB*  +  D  (AB*)  )  Ca  « 

i 

The  scanning  continues  until  the  second  delimiter  is  found.  The  following 
term,  1  AB*,  is  then  read-.  This  operand  appears  to  follow  Rule  IV.  1  as  it  con¬ 
tains  only  one  term  and  is  followed  by  a  star.  The  question  arises  as  to  what 

characteristic  will  differentiate  between  terms  of  the  form  *(A*B+  C)*1  and  the 
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form  1 AB*1 *  The  difference  lies  in  the  fact  that  In  the  first  case  the  entire  term  is 
operated  on  by  the  star,  while  in  the  second  case  only  the  one  literal  is  operated 
an  by  the  star*  This  characteristic  is  easily  checked  by  determining  if  the  first 
character  in  the  term  is  •(•  •  The  routine  which  reads  the  term  (the  subroutine 
^SLVL)  saves  the  first  character  for  this  test.  We  see  that  1  AB*’  does  not  follow 
Rule  IV.  1,  but  Rule  IV.7.  The  subroutine  which  is  used  to  expand  this  form, 

finds  that  the  first  literal  is  not  operated  on  by  a  star*  Under  this  condition  Rule 
IV* 7b  is  invoked  and  no  change  is  made. 

The  scan  is  continued  and  no  more  derivative  delimiters  are  found  before  the 
end  flag  (the  1  @  1 ).  Thus  a  new  pass  is  started  and  the  process  is  repeated.  The 
first  operand  which  is  found  is  1  (A*B  +  C)1  .  This  has  the  characteristics  of  having 
only  one  term,  not  operated  on  by  a  star,  and  it  is  enclosed  in  parentheses.  These 
characteristics  match  those  of  Rule  IV. 6,  thus  the  result, 

1  (D  (A*B  +  C)  (A*B  +  C)*AB*  +  D  (AB*)  )(<!  ' 

The  scan  is  continued  and  the  only  term  found  is  ‘AB*1  to  which  Rule  IV.  7b 
is  applied  and  no  change  is  made. 

A  new  pass  is  then  started  and  the  operand  1  AB*  +  C1  is  found.  This  operand 
consists  of  the  terms  ^*6*  and  1 C 1  and  they  are  connected  by  the  OR  operator. 
For  this  condition  Rule  IV. 3  used  to  produce, 

1  (  (D  (A*B)  +  D  (C)  )  (A*B  +  C)*AB*  +  D  (AB*)  )  <g  ' 
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Again  no  change  is  made  in  the  last  operand  and  a  new  pass  is  started.  The 
first  operand  found  is  1  A*B* .  This  operand  consists  of  a  single  term  with  the  first 
character  a  literal.  Thus  Rule  IV. 7  is  used.  The  subroutine  which  processes  this 
rule  scans  the  term  and  finds  that  the  first  character  is  operated  on  by  the  star  and 
Rule  IV. 7a  is  invoked.  This  produces  the  string, 

•(  (D  {  (A)*B)  +  D  (C)  )  (A*B  +  C)*AB*  +  D  (AB*)  )(u  1 

The  process  is  repeated  through  several  more  passes  and  the  following  ex¬ 
pression  is  obtained. 

#  (  (  (D  (A)  (A)*B  +  D  (B))+D  (C)  )  (A*B  +  C)*AB*  +  D  (AB*)  )<"  ' 

A  final  pass  is  then  made.  During  this  pass  all  of  the  operands  are  found  to 
follow  Rule  IV. 7b.  Thus  no  changes  are  made  in  the  string.  This  indicates  the 
end  of  the  expansion  operations.  The  final  step  in  obtaining  the  derivative  is  to 
implement  Rules  IV. 9a  and  IV. 9b.  These  rules  are  implemented  by  the  subroutine 
^MATCL*  In  this  routine  a  single  scan  Is  made  of  the  derivative.  When  the 
derivative  delimiter  is  found  the  following  literal  is  compared  with  the 
name  of  the  derivative  (in  this  example  the  literal  A).  If  the  two  characters 
match,  the  string  is  modified  by  Rule  IV. Va,  otherwise  the  Rule  IV. 9b  is  used. 

This  produces. 


1  (  ( (X  (A)*B  +  0)  +  0)  (A*B  +  C)*AB*  +  0)  sfi  • 
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After  this  derivative  is  formed  the  following  rules  are  used  to  simplify  the  results. 


IV. 10  00  =  00  =  0 

IV,  1 1  0  +  0  =  0  +  0  =  0 

IV. 12  \0  =  0X  =  X 

IV. 13  0 . 0  =  0 . 0  =  0 

The  simplifications  are  performed  in  a  manner  to  similar  to  the  operation  of 
expanding  the  string;  i.e.,  repeated  passes  are  made  until  a  pass  is  found  in  which 
no  enanges  were  made.  The  final  result  is, 

D/A/  (R)  =  1  (A)*B  (A*B  +  C)*AB*  (a  ' 


To  summarize  the  operation  of  forming  the  derivative: 

A.  Search  the  string  to  find  the  derivative  delimiter  and  test  the 
following  operand.  From  this  test  determine  which  rule  applies. 
Appendix  D  describes  in  detail  the  subroutine  /DCLAS  which 
performs  this  operation. 

B.  Expand  the  string  according  to  the  appropriate  rule.  The  sub¬ 
routines  ^DFl ,  0D?2,  /DF38,  /DF4,  /DF6,  and  /DF7  are  used 
to  implement  the  rules. 

C.  Repeat  A  and  B  until  a  pass  is  made  with  no  change  in  the 
string. 

D.  The  derivative  is  taken  by  the  subroutine  /MATCL  which 
implements  Rules  IV. 9a  and  IV. 9b. 

E.  Simplify  the  string. 

In  this  example  the  derivative  was  taken  with  respect  to  the  single  character 
1  A'  .  If  the  derivative  with  respect  to  a  string  of  characters  is  desired,  for  example 
D/AB/  (R),  the  rule  D/AB/(R)  =  D/B/  (D/A/  (R)  )  Is  used.  In  other  words  the 
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derivative ,  D/A/  (R)rh  formed  as  given  above.  Then  the  process  is  repeated, 
using  the  above  results,  to  form  the  derivative  with  respect  to  B. 

Calculating  the  Eta  Function 

A  technique  is  now  introduced  which  will  allow  the  eta  function  to  be  cal¬ 
culated  by  the  computer*  The  eta  function  is  defined  as, 


IV.  14 

n  (?)  =fx  ,  if  x  c  ^ 

IV.  15 

n  tfjjy  =  n  (Tp  .  n 

IV.  16 

+  =  n(sy  +  n(?2) 

IV.  17 

‘i  0,  •  ?2>  ■  n  cy  •  n  <?2> 

IV.  18 

n  (M)  =  In  C*)l 

From  the  first  rule  we  see  that  the  eta  function  is  a  binary  function;  i.e*, 
it  has  only  two  values  X  and  0.  The  other  rules  state  that  any  eta  function  can 
be  formed  by  finding  the  eta  function  for  simplier  terms* 

We  first  consider  the  types  of  expression  for  which  the  eta  function  can  be 
found  directly* 

A*  A  simple  term  consisting  only  of  alpha  characters  and  the  star 

operator;  e.g.,  A*B,  ABC,  or  A*B**  For  this  case  we  consider 
each  alpha  character  at  a  time* 

H  (A*)  =  X,  by  the  definition  of  the  star 
operator 

rj  (A)  ~  0,  A  any  literal  or  0 
r\  (X)  =  X,  by  definition  of  the  eta  function 
For  several  characters  which  are  concatenated  we  use  Rule  IV.  1  5 
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and  the  fact  that  X  .  0  ~  J2f.  From  this  we  conclude  that  the 
eta  value  of  a  simple  term  equals  lambda  if,  and  only  if,  each 
alpha  character  is  a  lambda  or  is  operated  on  by  a  star.  This 
test  can  be  easily  performed  on  the  computer  by  scanning  each 
character  from  left  to  right  and  testing  the  characters  to  see  if 
they  are  lambda  or  if  they  are  alpha  characters  followed  by  a 
star.  This  is  the  method  that  was  presented  as  the  second  ex¬ 
ample  problem  in  Chapter  II. 

B.  A  complex  term  of  the  form  (?)*.  Again  by  the  definition  of  the 
star  operator  we  find  that  q  (  (3)*)  =  X. 

C.  A  complex  term  of  the  form  (?).  For  this  case  the  eta  function 
cannot  be  directly  determined;  i.e.,  the  terms  enclosed  in  the 
parenthesis  have  to  be  tested  to  find  the  resulting  eta  function. 

This  process  of  testing  the  terms  was  written  into  the  subroutine  ^BETAT. 

This  subroutine  returns  the  three  results,  the  eta  value  equals  lambda,  equals  phi, 
or  is  undetermined. 

For  the  cases  with  more  complex  expression  we  will  use  the  following  rules. 


IV. 19 

X  .  X 

=  X 

IV.  20 

X  +  X 

=  X 

IV. 21 

0  .  X 

=  0 

IV. 21 

0+  X 

-  X/  where  X  is  either  lambda  or  phi 

These  rules  are  derived  from  the  fact  that  the  values  of  the  eta  function  (i.e. , 
lambda  or  phi)  are  binary  values. 

With  these  rules  in  mind  we  consider  the  example  r\  (  (A*  +  B)  A  +  C*). 

This  would  be  stored  in  the  computer  as  1  (  (A*  +  B)  A  +  C*)1  with  a  pointer  to  the 
first  left  parenthesis  at  the  start.  The  term  J(  (A*  +  B)A  +  C*)'  is  tested  and  it  is 
found  to  be  undetermined.  Thus  it  is  necessary  to  11  go  down  in  level"  to  evaluate 

the  expression.  This  is  performed  by  stepping  the  pointer  one  location  to  the  right. 

The  term  '(A*  +  B)1  is  then  tested  and  it  is  also  found  to  be  undetermined. 
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The  process  of  going  down  in  level  is  repeated  and  the  term  1  A*‘  is  tested 
and  found  to  have  an  eta  value  of  X.  This  term  is  followed  by  the  OR  operator. 
From  equation  IV.  20  we  see  that  result  will  be  lambda,  even  without  testing  the 
remaining  term*  Thus  the  term  'B1  is  passed  over  without  being  evaluated  and  the 
operator  1  )*  is  read.  This  indicates  the  end  of  the  present  level  and  the  need  to 
determine  11  what  to  do  next"  . 

In  this  example  the  verm  1  (A*  +  B)1  was  found  to  have  a  value  of  lambda 
and  it  is  concatenated  with  the  following  term  (the  'A').  By  equation  IV.  15  we 
see  that  in  evaluating  concatenated  terms  they  are  treated  as  being  "AND'ed11 , 
Thus  by  IV.  19  we  find  that  t  ie  value  of  expression  1  (A*  +  B)  A1  is  determined 
solely  by  the  term  'A1  .  The  term  'A1  is  tested  and  found  to  have  a  value  of  phi. 

The  value  of  that  part  of  the  expression  which  has  been  evaluated  is  phi 
and  the  following  operator  is  OR  and  by  IV. 22  we  see  that  the  result  is  determined 
solely  by  the  following  term.  This  term,  the  ‘C*1  ,  is  tested  and  found  to  have  a 
value  of  lambda.  The  only  remaining  character  is  a  right  parenthesis  which  indi¬ 
cates  the  end  of  the  expression  being  evaluated.  Therefore,  the  eta  value  of  the 
expression  being  tested  is  X. 

This  example  has  shown  a  method  by  which  the  eta  function  can  be  found  by 
a  process  of  scanning  the  expression  from  left  to  right  and  using  Rules  IV.  19  to 
IV. 22  to  determine  which  terms  are  tested  and  which  are  passed  over.  While  this 
example  was  admittedly  simple  the  same  technique  can  be  used  to  evaluate  any 
expression.  For  the  interested  reader  Appendix  D  describes  the  operation  of  the 
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subroutines  JltTATT  and  $&ETAT.  The  subroutine  JJETATT  performs  the  operation  of 
deciding  M  what  to  do  next11*  JJbETAT  is  called  by  £ETATT  to  evaluate  single  terms. 
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CHAPTER  V 


REXPRO  OPERATION 


Memory  Organization 

The  4096,  12  bit,  word  memory  of  the  SCC  650  is  divided  into  three  distinct 
areas  for  this  program,  as  shown  in  Figure  V.  1 .  The  first  section  consists  of  the 
first  64  words.  These  words  can  be  addressed  from  the  entire  computer  memory  via 
the  direct  memory  instructions. 

The  direct  memory  area  is  further  divided  into  four  sub-sections.  The  first 
section  occupies  locations  '0  to  1 17  and  holds  the  symbol  table.  This  table  stores 
the  literals  that  have  been  entered.  The  literals  are  stored  as  trimmed  ASCII 
characters.  These  literals  are  stored  in  the  string  as  a  four  bit  character  which 
corresponds  to  their  location  in  the  table.  A  flag  is  placed  in  the  seventh  bit  if 
the  character  was  seen  as  a  part  of  the  regular  expression  string  as  opposed  to  a 
substitution  string. 

The  second  section  of  the  direct  memory  contains  the  substitution  table  and  it 
is  stored  from  *20  to  *37.  This  table  contains  the  addresses  of  the  characters  which 
are  placed  in  the  regular  expression  string  as  the  name  of  substitution  strings  which 
are  to  be  substituted  into  the  main  string.  The  table  is  referenced  by  adding  '  20 
to  the  location  in  the  symbol  table  where  the  corresponding  character  is  stored. 

For  example,  if  the  character  'A'  is  the  third  character  which  is  stored  in  the 
symbol  table  it  will  be  replaced  with  '02  and  its  substitution  string  address  will  be 
stored  at  '22.  If  no  substitution  string  has  been  stored  the  corresponding  location 
in  the  substitution  table  contains  '0000. 
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The  third  section  contains  the  converted  (via  the  symbol  table)  literals  that 
are  used  to  name  the  derivatives.  As  this  data  is  used  at  a  point  in  the  program 
execution  at  which  the  substitution  table  has  already  been  used  the  derivative  name 
is  also  stored  from  1  20  to  1  37.  The  derivative  name  is  stored  as  a  string,  packed 
two  characters  per  word.  Therefore,  the  derivative  name  is  also  referred  to  as  the 
derivative  string.  A  maximum  length  of  31  characters  can  be  used  to  specify  the 
derivative  name  (one  cell  being  reserved  for  an  end  flag). 

The  last  section  of  the  direct  memory  (from  *40  to  1  77)  contains  the  non¬ 
string  variables.  The  information  contained  in  this  section  consists  of  string  starting 
addresses,  stiing  working  addresses,  temporary  character  and  address  storage,  input 
mode  information,  and  a  program  start  address.  A  detailed  list  of  the  information 
stored  in  this  section  is  contained  in  Appendix  B. 

The  next  section  of  the  memory  is  allocated  to  the  storage  of  the  program 
itself.  This  section  runs  from  1 100  to  ‘4757.  The  program  starts  execution  at 
'4704. 

The  remainder  of  the  core  is  used  to  store  strings.  The  first  string  which  is 
stored  is  the  regular  expression  string.  This  is  followed  by  the  substitution  strings, 
if  any.  The  last  string  is  the  work  string.  The  work  string  is  a  copy  of  the  regular 
expression  string,  with  substitutions  made.  The  work  string  is  the  string  which  is 
expanded  to  form  the  actual  derivative. 

There  is  no  fixed  length  for  these  strings.  The  only  limitation  on  length  is 
the  fact  that  they  have  to  fit  the  area  from  '4760  to  3  7540.  (The  upper  limit  was 
designed  to  save  the  loader.  It  can  be  increased  to  give  more  working  area  or  de¬ 
creased  to  save  other  programs). 
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Program  Execution 

Th*  program  is  executed  in  four  distinct  phases  as  shown  in  Figure  V.2. 

Phase  I,  During  the  first  phase  (controlled  by  MAINLINE)  the  program 
initializes  the  direct  memory  by  clearing  the  symbol  table,  the  substitution  table, 
and  the  string  addresses*  This  phase  also  sets  the  input  mode  for  the  teletype  key¬ 
board,  loads  the  program  starting  address  (in  the  variable  RESTRT)  and  sets  NEW. 
NEW  defines  the  free  area  where  strings  can  be  stored.  At  the  end  of  Phase  I  the 
message  "R-EXP  PROGRAM — READY*1  is  printed  and  the  program  enters  Phase  IL 

Phase  II.  This  phase  is  the  main  input  phase  and  is  controlled  by  the  routine 
MAINLINE.  During  this  phase  either  control  words  or  strings  may  be  entered  into 
the  computer.  The  program  receives  the  first  character  from  the  teletype  which  is 
not  a  carriage  return,  line  feed,  space,  or  tape  leader,  A  second  character  is 
then  received.  If  the  second  character  was  a  letter  it  is  assumed  to  be  part  of  a 
control  word.  Otherwise  it  is  treated  as  a  string.  The  first  two  characters  of  the 
control  word  are  stored.  When  a  line  feed  is  received  the  two  characters  are 
tested  and  the  control  word  is  executed.  The  functions  of  the  control  words  are 
listed  below: 

"TAPE  11  -  Input  via  paper  tope 
"KEYBOARD*1  -  Inpur  via  the  keyboard 
"EXECUTE"  -  Enter  Phase  III 
"TERMINATE"  -  Return  to  Phase  I 

If  the  operator  makes  a  mistake  or  desires  to  make  a  change  he  may  type  a 
left  arrow  before  the  line  feed.  The  operator  can  then  enter  a  new  string  or  con¬ 
trol  word.  The  entering  of  an  incorrect  control  word  will  cause  the  following  to 

be  printed:  "UNDEFINED  CONTROL  WORD". 
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F’gure  V,2 


If  th«  neeond  charactef  entered  was  not  a  letter,  the  program  checks 

'  i  1 

to  see  if  it  h  the  start  of  the  string.  The  first  string  that  is  stored  must  be  the 
regular  expression  string  and  it  is  denoted  by  an  1 R1  ,  followed  by  optional  spaces, 

•  i 

i 

followed  by  on  .  If  the  string  does  not  contain  an  the  message  "STRING 
DOESN'T  CONTAIN  =  N.S."  is  printed.  The  "N.S."  indicates  thct  Ne  string 
is  not  stored. 

i 

After  the  equal  mark  any  valid  input  character  may  be  entered.  These 

characters  are  listed  in  Appendix  A  along  with  the  code  in  which  they  are  stored. 

1  »  , 

An  invalid  character  will  caiise  a  1  ?'  to  be  printed  and  the  character  is  not  stored  * 
To  aid  in  the  loading  of  a  string,  a  carriage  return,  a  line  feed,  or  a  space  may  be 
entered  at  any  time  after  the  equal  $?gn. 

;  t 

To  correct  mistakes  in  loading  a  string  the  operator  may  use  a  ieft  arrow  or 

i 

a  vertical  arroy/.  The  left  arrow  dumpi  the  string  that  was  being  stored  and  allows 

a  new  string,  or  control  word,  to  be  entered.  The  vertical  arrow  removes  the  last 

character(s)  which  were  entered;  one  character,  is  removed  for  each  vertical  arrow. 

The  vertical  cannot  remove  the  characters  to  the  left  of  the  string;  i.e,,  the  term 

■R^1  .  These  editing  aides  must  be  used  before  the  string  is  terminated, 

1  The  actual  storage  of  the  string  is  handled  by  ^STORE  and  it  is  stored  at  the 

location  given  by  NEW.  The  program  adds  a  1 D  (*  prefix  fo  the  string  and  a  1  )4 

suffix.  These  characters  are  needed  for  proper  execution  of  the  string. 

An  1  (a  *  is  entered  to  terminate  the  string.  At  this  point  the  #OPCK, 

^PBCK,  and  ^AOCK  subroutines  are  called  to  test  the  strings.  They  test  for  the 

proper  context  of  the  various  operators,  for  properly  formed  parentheses  and 
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brackets,  and  for  use  of  the  Ill-defined  and-or  operation  (e.g.,  A  +  B.C).  If  any 
of  these  errors  are  found, the  respective  messages  “ILL -FORMED  OPERA,  -N.S.“  , 
“ILL-FORMED  (  )  OR  [  I  -N.S."  ,  are  printed*  It  should  be  noted  that  the 
testing  is  halted  when  the  first  error  is  found*  If  no  errors  are  found  the  string  is 
“stored"  by  updating  NEW  and  storing  the  starting  location  of  the  string.  Also 
all  of  the  literals  that  were  entered  with  the  regular  expression  strings  are  given 
a  flag  in  the  seventh  bit* 

If  there  was  an  error,  a  new  string  may  be  entered;  this  string  is  written  over 
the  old  string.  A  new  string  may  be  entered  to  replace  a  string  which  has  already 
been  stored*  This  does  not  remove  the  original  string,  but  sets  a  pointer  to  the 
new  string. 

After  the  regular  expression  string  has  been  stored  substitution  strings  may 

be  entered.  These  are  entered  in  the  form  ‘A  = - 1 ;  where  the  ‘A1  is  the 

literal  in  the  regular  expression  string  which  Is  to  be  replaced  by  the  string  to  the 
right  of  the  equal  mark.  Only  one  level  of  substitution  is  permitted;  thus  the  name 
of  the  substitution  string  (e.g.,  'A1)  must  have  been  entered  in  the  symbol  table 
and  it  must  also  have  a  flag  set  in  the  seventh  bit.  If  this  is  not  met  the  message 
"SUBSTITUTION  NOT  SEEN-N.S."  is  printed.  Except  for  this  test  and  the  fact 
that  the  string  is  prefaced  by  1  ('  instead  of  1  D('  the  loading,  testing,  and  storing 
of  the  string  is  the  same  as  the  regular  expression  string* 

When  the  strings  are  formed  the  subroutine  which  forms  the  symbol  table 

checks  to  see  if  it  is  full  (more  than  16  entries).  If  it  is  full  the  error  message 

“TABLE  FULL — EXEC.  HLT.11  is  printed.  Likewise  during  the  loading  of  the 
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string  (or  during  any  other  phase  in  which  characters  are  being  written  in  the  string) 
the  program  checks  to  see  If  there  is  room  left  in  the  memory.  If  not  “MEMORY 
FULL-£XEC.  HLT#  11  is  printed.  (The  upper  limit  of  the  available  memory ,  a 
normal  1  7540/  is  stored  at  location  *0764  and  can  be  changed.  It  should  be  set 
at  five  less  than  the  desired  upper  limit).  Both  of  these  errors  are  unrecoverable 
so  the  program  automatically  re-enters  Phase  I, 

Phase  HI.  When  the  program  enters  Phase  III  the  routine  ^SUB  is  called* 

This  routine  inserts  the  substitution  string/  if  any,  in  the  regular  expression  string. 
A  copy  of  the  regular  expression  string  is  then  printed  so  that  the  operator  can 
check  the  string.  If  no  string  was  stored  a  1  ?*  will  be  printed  and  control  returns 
to  Phase  II,  otherwise  the  program  goes  to  Phase  IV. 

Phase  IV.  This  phase  performs  the  actual  operation  of  taking  the  derivative 
and  it  is  controlled  by  DCONT.  This  phase  starts  by  calling  ^DRCPY  to  form  a 
work  copy  of  the  regular  expression  string. 

The  subroutine  $T)IN  is  called.  This  is  a  specialized  form  of  MAININ.  The 
only  control  word  that  can  be  entered  is  "  TERMINATE  M  and  it  is  tested  when  a 
line  feed  is  received.  This  control  word  causes  the  program  to  return  to  Phase  I. 

If  the  first  character  that  was  entered  was  a  1  D*  then  the  program  enters  a 
derivative  name  storage  section  of£t)IN.  The  name  has  to  be  prefixed  by  1 D/1 . 
and  then  the  operator  inputs  the  string  of  characters  which  forms  the  derivative 
name.  The  only  valid  characters  are  the  literals  which  were  entered  in  the  reg¬ 
ular  expression  string.  An  invalid  character  will  cause  a  1  ?'  to  be  printed.  A 

test  of  the  length  of  the  derivative  name  is  made  and  if  it  exceeds  31  characters 
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then  "DERIVATIVE  $  FULL-N.S.  "  is  printed  and  the  name  is  not  stored.  During 
this  phase  all  of  the  information  is  received  via  the  keyboard. 

As  in  Phase  II  a  left  arrow  and  a  vertical  arrow  can  be  used  to  edit  the  name 
or  the  control  word.  When  the  name  is  followed  by  a  */*  the  routine  returns  to 
DCONT*  If  no  string  has  been  stored  a  1  ?*  will  be  printed  and  the  routine  remains 
in  Jit>IN. 

When  the  program  returns  to  DCONT  the  subroutine  #t)CLAS  is  called  to 
classify  the  forms  of  the  derivative  of  the  string.  £t)CLA5  Jnturn  calls  #t)Fl  ,  ^0F2f 
2t>F3S,  $DF4,  ^DF6,  and  $t)F 7  to  expand  the  string. 

After  one  pass  through  the  work  string  the  program  returns  to  DCONT.  At 
this  point  a  check  is  made  to  see  if  any  changes  v/ere  made  in  rhe  work  string.  If 
a  change  was  made/  then  the  program  returns  to  DCONT. 

This  is  repeated  until  a  pass  is  made  with  no  changes  in  the  string.  At  this 
point  the  string  consists  only  of  terms  of  the  type  'DC  followed  only  by  literals 
which  are  in  turn  followed  by  a  1 )'  .  These  terms  can  be  combined  by  any  of  the 
valid  regular  expression  operators. 

DCONT  then  finds  the  first  character  in  the  derivative  name  and  passe:  this 
to  ^MATCL,  This  routine  has  two  purposes.  The  first  is  to  perform  the  final 
operation  of  taking  the  derivative.  This  uses  the  rules: 

Da  (A.  .  .  .)  =  X 

Db(A.  .  .  .)  =  0 

SfMATCL  then  calls  four  subroutines  to  simplify  the  string  using  the  regular 
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expression  identities.  As  in  J^DCLAS  this  is  an  Iterative  operation  which  is  repeated 
until  there  are  no  changes  in  the  string. 

DCONT  then  reforms  the  work  string  by  adding  fD(l  to  the  front  and  1  )•  to 
the  end  as  these  were  removed  by  ^MAICL,  The  program  then  goes  back  to 
^DC LAj  and  the  procedure  cs  repeated  using  the  second  character  in  the  string 
name.  The  total  process  is  repeated  until  the  last  character  in  the  derivative  name 
is  used.  When  this  occurs  the  results  are  printed  along  with  the  name  of  the  de¬ 
rivative.  JJETATT  is  then  called  to  see  if  the  string  contains  lambda  and  it  prints 
“CONTAINS  \M  if  the  test  passes. 

At  this  point  one  complete  derivative  has  been  found  and  DCONT  returns  to 
the  routine  ^DRCPY  to  obtain  a  new  work  string  and  the  process  Is  repeated.  The 
program  remains  in  Phase  IV  until  the  control  word  "TERMINATE  “  is  entered. 

(The  user  who  wants  to  study  the  operation  of  the  program  can  replace  the 
following  11  MOPS"  with  calls  to  the  printing  routines.  Ar  location  *4554  load 
'5301  and  at  *4555  load  1  1200,  This  causes  the  string  to  be  printed  for  every  pass 
through  "^DCLAS11 .  At  location  '4610  he  can  load  1  5301  r  at  location  '4611 
load  *  1233,  at  *4612  load  *5301 ,  and  at  1  4613  load  1 1200.  This  causes  the  de¬ 
rivative  name  and  the  string  to  be  printed  for  each  pass  through  MATCL,  Dus  to 
the  large  amount  of  data  that  is  produced  and  the  slow  speed  of  the  printer  these 
changes  are  not  recommended  for  normal  use). 
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CHAPTER  VI 


CONCLUSION 

This  paper  has  described  a  practical  digital  computer  program  which  can  be 
used  to  find  the  derivative  of  a  regular  expression.  Chapter  II  shows  how  useful 
the  regular  expression  can  be  in  the  design  of  a  finite  state,  sequential  machine. 

The  remaining  chapters  describe  the  techniques  us  H  to  implement  REXPRO  and  its 
use. 

The  usetulness  of  this  program  can  be  seen  by  studying  the  second  example 
in  Chapter  IL  The  optional  prints  were  added,  as  described  in  Chapter  V,  to  print 
the  partial  results  for  every  pass  through  the  subroutine  jfoCLAS.  These  partial 
results  are  formed  in  approximately  the  same  manner  in  which  a  designer  v/ould 
form  the  derivative.  The  partial  results,  shown  in  Figure  VI.  1,  were  those  obtained 
during  the  process  of  calculating  the  derivative  D/A/ft.  When  one  considers  the 
amount  of  work  needed  to  form  this  derivative,  and  that  this  is  only  one  of  the 
several  derivatives  that  are  needed  for  this  example,  this  program  drastically 
reduces  the  arrount  of  work  (and  thus  the  possibility  of  errors)  needed  to  use  the 
regular  expression  in  design  work. 

This  example  shows  one  of  the  problems  that  are  involved  in  using  the 
derivative  of  the  regular  expression ,  In  the  flow-graph  which  describes  the  machine 
designed  for  example  two.  Figure  11.6,  it  can  be  shown  tha*  state  q ^  and  are 
identical.  This  implies  that  the  derivatives  associated  with  these  states  are  also  identical. 
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These  derivatives  are, 


DA/  “  (  /  +  N*S)  (N*  (AN*S  +  L  +  LN*S  +  PN*S)  (*N*0,  and 
DAN/  «  (N*S  (N'v  (AN*S  +  L  4*  LN*S  4*  PN*5  )  )  *N*0  +  (N*  (AN*S  +L  +  LN*S 
4-  PN*S)  (  N*  (AN*S  4  L  -f  LN*S  4*  PN*S  ))*N*Of  N*0)  ) 

which  are  not  obviously  the  same. 

The  question  appears  at  this  point;  how  can  these  two  derivatives  be  tested  to 
show  that  they  are  the  same?  Can  this  testing  be  done  by  the  program  so  that 
the  operator  does  not  have  to  monitor  the  operation  of  the  program?  Can  we 
guarantee  that  a  machine  will  be  formed  with  a  minimum  number  of  states? 

Another  question  arises  from  this  work.  Can  the  regular  expression  be 
forced  to  generate  a  machine  of  a  specific  form?  This  machine  would  not  necessarily 
be  a  minimal  state  machine,  but  rather  a  machine  with  certain  desired  characteristics; 
g./  a  machine  that  is  easily  implemented  by  a  particular  type  of  hardware. 

It  is  hoped  that  by  using  this  program  as  a  tool  that  these  questions  can  be 
answered. 
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K-EXP  PROGRAM- -READY 
R=N*Z(N*Z>*N*0@ 

Z=AN*S+L+LN*S+PN*S@ 

EXECUTE 

CHECK  C0PY. . 

D(N*<AN*S+L+LN*S+PN*S>  CN*CAN*S+L+LN*S+PN*5>  >*N*0 ) 

D/A/ 

<D<N*>  <AN*S+L+LN*S+PN*S> C N* < AN*S+L+LN*S+PN*S > ) *N*C  +  D ( C AN 
*S+L+N*S+PN*S) ) CN*CAN*S+L+LN*S+PN*S> >*N*0) 

(DC  <N>*> <AN*S+L+LN*S+PN*S>  <N* <AN*S+L+LN*S+PN*S> )*N*0+DCA 
N*S+L+LN*S+PN*S>  <N*tAN*S+L+LN*S+PN*S)  )*N*0> 

CD< (N) ) <N>*(AN*S+L  +  LN*S+PN*S:> <N*CAN*S+L+LN*S+PN*£))*N*0+ 
CDCAN*S>  +  <D<L>+CD<LN*S)+D<P<N>*S>>  > ) <N* (AN*S+L+LN*S+PN*S 
> )*N*0> 

(D(N5  <N)*<AN*S+L+LN*S+PN*S> <N* C AN*S+L+LN*S+PN*S ) ) *N*0+  C D 
<ACN>*S)+CD<L>+{D(L<N}*S>+D<° ) <N>*S) ) >  (N*  ';AN*S+L+LN*S+PN 
*S>  >*N+0> 

<D<N) (N)*<AN*S+L+LN*S+PN*S>  <N*(AN*S+L+L.N*S  +  PN*S)  >*N*0+(D 
(A) <N>*S+(D(L)  +  CD<L>  <N)*S  +  0(P  >  <N)*S) ) ) (N* ( AN*S+L+LN*S+PN 
*S>  >*N*0> 

<  D(N  )  (N>*(AN*S  +  L  +  LN*S  +  PN-J  S>  <N* (N*S+L+LN*S+PN*S )  )*N*0+  CD 

<  A )  <N)*S+(D(L>+<D<L)  <NX  j  +  D(P><N)*S)  )  )  <N* <AN*S+L+LN*S+PN 
*S) >*N*0> 

D/A/= 

<N)*S<N*<AN*S+L+LN*S+PN*S) )*N*0 


Figure  VI.  1  The  Derivative  D/A/ft 
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APPENDIX 


APPENDIX  A 


PROGRAM  ALPHABET  AND  CODING 


The  fol lowing  list  defines  all  of  the  characters  that  may  be  used  to  enter 
regular  expressions  in  the  computer,  their  meaning,  and  coding.  The  ASCII  code 
is  the  value  that  is  received  from  the  teletype  or  used  to  output  on  the  teletype. 
The  infernal  code  is  the  value  used  to  represent  the  characters  in  the  program. 
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Character 

ASC  II 

Internal  Code 

A  -C 

E  -Q 

s  -z 

0-1 

' 01 -'03 

•0  -  *17 

Literals  Assigned 

'05-' 21 
'23-' 32 
'60-'61 

by  Symbol  table 

( 

’50 

'20 

) 

'51 

'21 

[ 

’33 

'33 

u  NOT1*  delimiter 

] 

'35 

•35 

\ 

'34 

'34 

Lambda 

- 

'55 

'25 

0 

D 

'04 

'30 

Derivative 

R 

'22 

not  stored 

R-EXP 

= 

'75 

not  stored 

# 

'56 

'26 

“AND" 

(CL 

'00 

'37 

End  flag 

/ 

'57 

'27 

Derivative  delimiter 

* 

'52 

'22 

Star  operator 

NULL 

infernal 

'36 

NULL  character 

+ 

'53 

'23 

“OR" 
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APPENDIX  B  ; 

i 

LIST  OF  VARIABLES 

The  following  list  gives  all  of  the  variables  used  iri  REXPRO,  along  with 
their  location  and  a  description  of  their  meaning.  All  of  these  variables  are 

;  I  1 

stored  in  the  direct  reference  portion  of  the  SCC  650  memory. 
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APLUS 

'45 

'A1  temporary  storage  1 

ATEMP 

.  '45  , 

‘A1  temporary  storage 

BC 

'44 

Brackets  counter 

CHRMAT 

'44  . 

Character  match 

DCTM 

'44 

Derivative  counter 

DRSTRT 

'75 

Start  of  derivative  $ 

DPC  • 

'44 

Derivative  pass  counter  1 

ETAVAL 

'45 

ETA  value,  r\  (tf)  ' 

LG  i 

'43 

Line  counter 

LEHP 

'66 

Level'end  half  point 

LEMP 

'67  ' 

Level  end  memory  point 

LHP  1 

'64 

Level  start  half  point 

LMP 

'65 

Level  start  memory  point 

LRHP 

'62 

Last  read  half  point 

LRMP 

'63 

Last  read  memory  point 

'  i 

MESST 

'53 

Message  address 

NEW  , 

'73 

Next  available  %  storage  locati 

PC 

'43, 

Parenthesis  counter 

RESTRT 

'77 

Start  of  program 

RHP 

'60 

Read  half  point 

RMP 

'61 

Read  memory  point 

-59- 


I 


RSTART 

'74 

Start  of  R-tXP  $ 

SYMS 

'56 

Location  in  symbol  table 

SA1 

'50 

$  add  one  at  end  of  break  point 

SA2 

'51 

$  add  two  at  end  of  break  point 

SRP 

'52 

$  return  address  at  end  of  break 

TIN 

'72 

Type  of  input 

THP 

'54 

Temporary  half  point 

TMP 

'55 

Temporary  memory  point 

T2HP 

'56 

Sec,  temporary  half  point 

T2MP 

'57 

Sec.  temporary  memory  point 

WHP 

'70 

Write  half  point 

WMP 

'71 

Write  memory  point 

XPLUS 

'46 

'X'  temporary  storage 

XTEMP 

'47 

'X1  temproary  storage 

APPENDIX  C 


PROGRAM  LOADING  ORDER  AND  MAP 

This  list  gives  the  order  in  which  the  various  subroutines,  and  the  mainline 
program,  for  REXPRO  are  loaded  in  the  memory.  For  those  interested  in  studying 
the  operation  of  the  subroutines  the  starting  address  of  each  of  the  subroutines  is 
given.  These  addresses  are  given  as  octal  (base  eight)  numbers. 
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ANORTT 


0100 

BIN  0117 

BOUT  0140 

CONVRT  0153 

DERTST  0211 

ENDTST  0223 

LSTTST  0235 

LLTST  0246 

NULTST  0265 

RLTST  0277 

STRTST  0316 

SYMF  0330 

CRLF  0353 

MESS  0401 

S'TRD  0666 

#RDNNL  0676 

jftEAD  0705 

#NWRT  0737 

?REWRT  0747 

^WRITE  0760 
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ALPH1 1 

1022 

READ1 

1044 

QUES 

1076 

SYMIN 

1112 

SYMSRH 

1127 

CONSYM 

1153 

$foUT 

1200 

DTI  i'LE 

1233 

^SLVL 

i306 

?1SIM 

1416 

?2SIM 

1525 

?3SIM 

1636 

#4SIM 

1724 

?OPCK 

1765 

^COPY 

2040 

^AOCK 

2061 

tfMATCL 

2175 

#>IN 

2317 

yPBCK 

2506 

XBPSET 

2616 

2ft  TBP 

2724 
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^DRCPY 

2761 

XSUB 

3003 

?STORE 

3045 

JfcETAT 

324  J 

J2t  TATI 

3335 

#>F2 

3500 

$DF7 

3655 

?DF6 

3725 

?DF4 

3751 

2t)F38 

4033 

^DFl 

4143 

Xdclas 

4213 

MA'NIN 

4346 

DCONT 

4542 

MAINUNE 

4704 
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APPENDIX  D 


DESCRIPTION  OF  SELECTED  SUBROUTINES 

In  fh is  appendix  a  detailed  description  is  given  for  the  subroutines:  ^SL  VL, 
^BPSET,  #?TBP,  ?DCIAS,  #>F1,  #DF2,  $DF38,  ^DF4,  ;St>F6,  jTDF7,  #ETATT, 
and  5&ETAT.  The  first  three  subroutines  are  used  to  perform  basic  string  manipulations. 
The  remainder  are  used  to  form  the  derivative  of  the  regular  expression.  These 
subroutines  were  selected  to  describe  the  basic  operation  of  REXPRO  without 
getting  involved  in  the  numerous,  but  necessary,  subroutines  that  perform  second¬ 
ary  functions;  e.g.,  the  routines  used  for  input/output. 

These  subroutines  are  described  by  means  of  a  flowlist.  The  FI  owl  1st  presents 
the  various  steps  involved  in  executing  that  routine.  The  symbols  ,'A“  and  "X" 
are  used  to  represent  the  accumulator  and  index  registers.  The  representations, 
(LRMP)*,  indicates  an  indirect  operation;  i.e.,  the  value  stored  at  the  location 
LRMP  is  used  as  a  pointer  to  the  desired  location.  The  remaining  terms  in  the 
flowlist  are  self-explanatory. 

^SLVL-(String  search  for  level)-  This  subroutine  is  used  to  read  one  term  of 
a  string  and  to  give  a  relative  classification  of  the  level  of  the  term.  A  term  is 
defined  as  a  set  of  characters  consisting  of  only  the  alpha  characters  and  the  star 
operator,  or  any  valid  set  of  characters  which  are  enclosed  in  a  set  of  parentheses 
or  brackets.  The  level  of  a  term  indicates  its  ranking.  Several  terms  that  are 
connected  by  the  AND,  the  OR,  or  the  concatenation  operators  are  on  the  same 
level.  If  these  terms  are  enclosed  In  parenthesesa  new  term  is  formed  viiich  has  a 
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higher  level  than  the  individual  terms.  ^SLVL  gives  a  relative  ranking  of  the  level 
by  the  way  the  subroutine  returns  to  the  calling  program.  A  standard  return  is 
executed  if  terms  of  the  same  level  follow.  A  non-standard  retirn  is  executed  if 
the  term  is  followed  by  either  a  right  parenthesis/  right  bracket,  or  an  end  flag; 
i.e.#  no  terms  of  the  same  level  follow. 

The  search  for  a  term  is  started  at  the  location  given  by  RMP/ftHP.  This 
location  should  be  on  or  before  the  start  of  the  term.  Upon  return  ^SLVL  contains 
the  following  information:  the  iocation  of  the  start  of  the  term  is  stored  in  LMP/ 
LHP,  the  end  of  the  term  is  given  by  LEMP/LEHP/  the  location  of  the  character 
following  the  term  is  given  by  LRMP/LRHP,  and  11  A“  con*a  ins  the  character  follow¬ 
ing  the  term. 

Figure  D.  1  shorn  several  examples  of  strings  and  the  resulting  locations. 

These  locations  are  indicated  by  an  arrow.  The  value  of  RMP/ftHP  before  ^SLVL 
Is  called/  is  represented  by  an  S;  all  of  the  other  locations  given  are  those  obtained 
after  a  return  from  ^SLVL. 


A  standard  return  is  a  return  to  the  location  following  the  call  statement.  A 
non-standard  return  is  a  return  to  the  second  location  following  the  call  state¬ 
ment. 
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(b) 


(c) 


(d) 


Figure  D.  1 


S 

I 

A  B 


LMP/LHP 


lrmp/lrhp 

I 

C  +  (  --- 

}  ♦ 

I  RMP/1*HP 

LEMPAEHP 


Standard  Return 


*  LRMP/LRHP 

♦  ♦ 

" +  A  B  *  C  (  -  .  ) 

♦  f  ♦ 

LMP/LHP  I  rmpahp 

LEMP/LE  HP 


Standard  Return 

S 

♦ 

- (  A  B  .  C  ) 

LMP/LHP 


Standard  Return 
S 

A  B 

i 

LMf/LHP 


lrmp/lrh: 

♦ 

t  ♦ 

I  RMP/R  HP 

LEMP/LE  HP 


LRMP/LRHP 

I 

C  )  )  --- 

j  * 

I  RMP/R  HP 

LEMP/LE  HP 


Non-standard  Return 


The  Operation  of/SLVL. 
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J^SLVL  is  often  used  to  read  all  of  the  terms,  in  a  string,  which  are  on  the 


same  level.  The  set  of  operations  needed  to  perform  this  is  shown  below. 


recall:  ^SLVL 

Jump  - ■» 

Jump  *+■  A 
RMP  ■<-  LRMP  «•--** 
RHP  LRHP 
—  Jump 
A 


Standard  return 
Non-standard  return 
Backspace  read 

Read  next  term 
Continue 


An  explanation  is  needed  for  the  use  of  the  read  local  ion  backspacing. 
Examining  Figure  D.  la  it  is  seen  that  RMP/kHP  gives  the  location  of  the  first 
character  of  the  following  term  so  that  $SLVL  can  be  called  without  modifying 
the  read  address.  However,  in  the  example  shown  in  Figure  D.lc.  RMP/lRHP 
gives  the  location  of  the  second  character  in  the  following  term,  but  LRMP/LRHP 
gives  the  location  of  the  first  term.  Thus  the  read  address  is  backspaced  before 
calling  ^SLVL  a  second  time. 

An  outl ine  of  the  operation  of  $SLVL  is  given  in  Flowlist  D*  1  . 
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FLOWUST  D.l  #LVL 


Enter:  ^SLVL 

PC  ^  0 

■ — ►  Read  one  cell 

IS:  cell  an  alpha  char. 

no 

-  IS:  cell  =  '  ('  or  ’[ 1 

PC  ♦  ] 

LMP  ♦  LRMP 
LHP  ■+  LRHP 


Read 


one  cel 


IS:  cell  -•  '('  or  <f  i  _ 

PC  PC  +  ] 

—  Jump 
no 

IS :  cell  ~  1 )'  or  '  1  *  ^ _ 


PC  PC  -  ] 

no 

1 - IS:  i>C  =  0 


LEMP  ♦  LRMP  _ _ 

LEHP  *  IE  HP 

Read  next  non-NULL  char. 

yes 

IS:  char.  =  l**  _ _ _ _  | 

Jump  B 


Clear  counter 


Term  enclosed  in 
parenthesis 

Term  start  loc. 


Check  tor  end  of  term 
Possible  end  of  term 


Check  for  followir-j  1  *• 
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LMP  +  LRMP  ■<-  A 


Term  consists  of 
alpha  char. 


LHP  ««■  LRHP 

LEMP  4.  LRMP 

LEHP  LRHP 

Read  next  non-NULL  char. 

IS:  char  =  alpha  or  1  **  - 

Continue  B 


Assume  end  of  term 


End  of  term  has  been 
found  and  following 
character  has  been 
read 


IS:  char  -  ' )‘  or  1  ]■  or  1  {§,  1 

Non-standard  return  No  s;mi)ar  ,eve, 

follows 

Standard  return 
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3^BPSET -(String  break  point  set)-  This  routine  is  used  to  set  a  break  point  in 
c  string.  The  purpose  of  the  break  point  is  used  to  insert  characters  in  a  string. 

The  break  point  is  set  after  the  last  character  which  was  read  before  calling 
J^BPSET.  This  character  appears  in  the  accumulator  and  may  be  changed  by  modify¬ 
ing  "A"  before  the  call.  The  index  register  contains  the  flag,  if  any,  and  may  not 
be  changed.  The  first  operation  of  ^BPSET  consists  of  rewriting  this  character  in 
the  string.  The  following  1 ,  2,  or  3  characters  are  read  and  stored  in  SA1  and 
SA2,  A  flag  is  added  to  either  SA1  or  SA2  for  use  with  ^RTBP  to  set  the  return 
breakpoint.  SRP  is  used  to  store  the  string  return  point.  This  is  an  address  indi¬ 
cating  where  the  string  is  to  be  continued. 

After  the  characters  have  been  stored  a  jump  to  the  location  given  by  NEW 
is  added.  Figure  D.2  shows  the  memory,  before  and  after  the  call,  for  several 
different  types  of  strings.  In  these  examples  the  break  is  to  be  made  after  the 
character  *  A1  and  NEW  is  equal  to  '6000. 

After  ^BPSET  is  called  the  characters  to  be  inserted  in  the  string  are  stored 
starting  at  the  location  given  by  NEW/0.  This  can  be  accomplished  by  using 
^COPY  or  ^NWRT.  #kTBP  is  then  called  to  insert  the  characters  which  were  re¬ 
moved  by  ^BPSET  and  then  sets  a  return  jump  to  the  location  given  by  SRP. 

An  outline  of  the  operation  ^BPSET  is  shown  in  Flowlist  D.2. 
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Before 


After 


Before 


After 


Before 


After 


'5000 


'5000 


SA1  = 


SA2  =  C, 
SRP  =  '5002 


(a) 

'5000 


'5000 


A 

B 

C 

F-A 

'6000 

SA1  -  NULL 
SA2  =  B,  F-C 
SRP  «*  '5002 


(b) 


'5000 


’5000 


SA1  = 
SA2  = 
SRP  = 


0 

'5500 


(c) 


Figure  D.2  Examples  of  the  Operation  of  JfPBSET. 
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FLOWLIST  D.2  /BP$ET 


Enter:  /BPSET 
APLUS  <•-  ".A" 

XPLUS  "X" 

SA2  o 

yes 

IS:  LR HP  =  0  - 

"A"  ♦  APLUS  +  flag 

Store  in  cell  at  loc.  given 

by  LRMP/LRHP 
APLUS  NULL 

Jump  A 

"A"  ^  APLUS  _ 

Store  in  cell  at  loc.  given 

by  LRMP/LRHP' 

Read  2nd  cell 
APLUS  "A" 

XPLUS  "X" 

I 

'  Store  F-NULL  at  loc.  given 
by  LRMP/LRHP 

IS:  XPLUS  =  '40  A 

Jump  C 

SA1  APLUS  +  XPLUS 
LRMP  •*.  LRMP  +  ] 

Jump  B 


Character  last  read  is  re¬ 
placed  with  modified  value 
and  break  point  is  set  in 
2nd  cell 


Character  last  read  is  re¬ 
placed  with  modified  value 
in  1st  cell 


This  location  is  in  2nd  cell 


Did  last  character  read 
have  flag 


SA1  contains  only  character 
+  flag  removed  from  string 
RMP  contains  return  loc. 

NEW  is. stored  in  this  loc. 
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Continue  C 


SA1  APLU5 
Read  one  character 
Exchange  halves  of  "  A“ 
APLUS  ♦  "A" 

Read  next  character 
"A”  "A"  +  APLUS 

yes 

IS:  "A"  =  NULL,  NULL  — 

SA2  "A"  +  flag 

Jump  B 

SA1  SAl  +  flag  -i 

Store  NEW  at  (LRMP)*  -4-  B 

SRP  *■  RMP 

Return 


First  Character  or  NULL 
removed  from  % 


"A"  now  contains  both 
characters,  less  flag,  stored 
in  loc.  following  break  point. 
"X"  contains  flag. 

Are  both  characters  NULL's 

Set  return  flag 


Set  return  flag 
Set  break  address 
Set  return  address 
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$lRTBP-(string  return  breakpoint)-This  subroutine  is  used  to  return  the  break¬ 
point  which  was  set  by  ^BPSET.  $fcTBP  adds  the  character  which  was  stored  in  SA1 
and  then  adds  the  two  characters  stored  in  $A2.  The  return  address  is  stored  in  the 
string  and  the  value  of  NEW  is  updated.  The  read  address  is  set  to  the  string  loca¬ 
tion  containing  the  character  which  was  passed  via  SA1. 

It  should  be  noted  that  before  calling  JJlRTBP,  WMP/V/HP  contains  the  loca¬ 
tion  of  the  first  cell  following  the  inserted  string.  This  address  resulted  from  the 
routines  which  inserted  the  string;  ^COPY,  ^WRITE,  or  #'NWRT.  WMP/WHP  may 
not  be  modified  before  the  call  to  $RTBP. 

$RTBP  checks  the  data  that  was  passed  via  SA1  and  SA2  to  see  if  they  contain 
any  information.  SA1  contains  only  one  character  and  this  character  will  contain 
information  only  if  it  does  not  contain  a  NULL  without  a  flag.  If  SA2  contains  any 
information  it  will  contain  at  least  a  flag,  otherwise,  SA2  will  be  zero  and  a  flag 
is  added  to  SA1 . 

The  reader  should  refer  to  Flowlist  D.3  ft  outline  of  the  operation  of  this 
routine* 
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FLOWLIST  D.3  ;&TBP 


Enter:  ^faTBP 
RMP  -*•  WMP 

RHP  WHP 

no 

IS:  WHP  **  0  - 

yes 

IS:  SA1  =  '36  A 

Write  NULL  in  1  st  cell 
Write  SA1  in  2nd  cell 

IS:  SA2  =  0  A 
Store  SA2  at  (WMP)* 

WMP  WMP  +  1 

Store  SRP  at  (WMP)* 
NEW  *  WMP  +  1 
Return 


Set  read  loc.  to  1st  string 
add  loc. 


Check  loc.  of  next  cell 

Next  cell  is  in  1st  half 
of  word.  Check  SA1  for 
NULL-without  a  flag 

SA1  is  to  be  written.  Write 
NULL  to  fill  1st  cell 

Write  SA1  in  2nd,  even  if 
it  does  not  contain  informa¬ 
tion,  to  fill  out  word 


SA2  contains  information. 
Store  both  characters  at 
once 


Store  return  loc. 
Update  NEW 
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Jft)C LAS  -(string  derivative  classification)-This  subroutine  Is  called  by  DCONT 
to  classify  the  substrings,  of  the  work  string,  as  given  below.  After  classifying 
XDCLAS  calls  the  appropriate  subroutine  which  does  the  actual  expansion.  The 
different  forms  of  the  substrings  are  listed  below  with  their  expanded  form  on  the 
right. 


D.l 

D 

(  (SO*) 

=  D  (JO  (SO* 

D.2a 

D 

(0U) 

&2)  )  =  (D  (0\)  (02)  +  D  (02)  ),  n  (01)  =  X 

D  .2b 

D 

02))  =  D(01)(02)f  n(0l)  »  0 

D  .3 

D 

cn  + 

02)  =  (D  $1)  +  D  (02)  ) 

D.4 

D 

m) 

»  [D  (0 )] 

D.5 

* 

D  .6 

D 

(<*>) 

-  D  (3) 

D  .7a 

D 

(A*.  . 

.  .)  =  D  ((A)*.  .  .  .) 

or 

D.7b 

D 

(A.  . 

#  *  •)  —  D  (A .  .  ,  , ) 

D.8 

D 

(?1  .  $2)  =  (D  (^1)  .  D  $2)  ),  where  'A*  is  one  or  more  literals 

and  represents  any  substring. 

JfoCLAS  starts  at  the  left  of  the  string  searches  for  the  '  D*  and  saves  its  loca¬ 
tion  in  THP/TMP.  The  following  1  ('  is  then  found  and  its  location  stored  in  T2HP/ 
T2MP. 

This  parenthesis  forms  the  start  of  a  level.  The  number  and  type  of  the  terms 

*  The  first  eight  forms  are  numbered  according  to  the  subroutines  which  use  these 
forms.  Due  to  a  reorganization  of  the  subroutines  there  is  no  form  D,5. 
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Is  tested  to  find  the  chara  :teristlcs  of  the  substring.  Figure  D,3  shows  the  different 
characteristics  for  the  different  forms  and  also  the  set  of  pointers  which  gives  the 
location  of  parts  of  the  substring.  For  substrings  of  the  forms  2  or  7  the  final  testing 
needed  to  indicate  the  proper  expansion  Is  contained  in  the  individual  expansion 
routines.  It  should  be  noted  that  forms  3  and  8  are  identical  except  for  the  connec¬ 
tive  operator  and  are  expanded  by  the  same  subroutine. 

After  the  substring  is  classified  the  proper  expansion  subroutine  is  called. 

These  subroutines  are  titled  according  to  the  form  they  handle;  e,g. ,  $DF1  expands 
substring  of  form  1  .  If  any  changes  were  made  in  the  string  by  the  expansion  sub¬ 
routines  this  is  indicated  by  incrementing  DPC.  When  these  subroutines  return  to 
XbCLAS  the  read  address  (RHP/lRMP)  is  left  pointing  to  one  of  the  characters  in  that 
portion  of  the  string  which  was  modified.  JJlDCLAS  then  searches  for  the  next  char¬ 
acter  1  D1  and  the  process  is  repeated. 

XbCLAS  returns  to  the  calling  routine  when  it  encounters  the  end  flag.  The 
calling  routine  then  tests  to  see  if  any  changes  were  made  in  the  string.  If  changes 
were  made  then  ^DCLAS  is  called  again  to  see  if  any  new  forms  were  generated  during 
the  last  expansion.  This  process  is  repeated  until  there  is  a  pass  with  no  changes* 

The  reader  is  referred  to  Flowlist  D.4  for  an  outline  of  this  routine. 
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FORM  OF  SUBSTRING 


Characteristics  of  Substrings. 


FLOWLIST  D.4  #)CLAS 


Enter:  ?SCLAS 
RHP/fcMP  0/DRSTRT 


Read  a  character  -4-  A 
no 

- —  IS:  Char*  -  1  @  1 
Return 

no 

— hn-IS:  Char.  =  'D1  -►  A 


THP/TMP  LRHP/LRMP 
Find  '(' 


Save  loc.  in  T2HP/T2MP 
Read  a  term 

ye$ 

IS:  Term  last  in  level  B 

no 

— —  IS:  Term  followed  by  '+*  or  1  # 

Call:  #DF38 

Jump  «*■  A 

— ►  Read  next  term 
no 

-  IS:  Term  last  in  level 

Jump  - — - - - - 


Call:  #DF2 


Jump  -►  A 


Continued  on  following  page 


Set  string  starting  address 


End  of  one  pass 


Save  ioc.  of  1  D* 


#$LVL 

Forms  2,  3  or  8 
Form  3  or  8 


Form  2 
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Continued  +  B 
no  ~ 

-  IS:  Last  char.  “  1 )' 

Call:  #)F6 
Jump  -►  A 

no 

>  IS:  Last  char.  =  1  ]  1  _ 

Call:  ?DT4 

Jump  -►  A 
yes 

-  IS:  Last  char.  =  '**  ^ _ 

Call:  #>F 7  ^ _ _ _ 

Jump  «►  A 

no 

— ►  IS:  First  Char.  =  '(' _ _ 

Call:  ^DFl 
Jump  -►  A 


Loc.  of  end  character  given 
by  LEHP/LEMP 

Form  6 


Form  4 


Form  7 


Loc.  of  first  character  given 
by  LHP/LMP 

Form  1 
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Xt)Fl ^(string  expansion  form  1 )  —This  subroutine  is  called  by  ^bCLAS  to  expand 
strings  of  the  form  'D  (  (JO*)*  to  strings  of  the  form  ‘D  (J?)  (^O*1  • 

An  outline  of  this  routine  is  shown  in  Flowlist  D.5.  The  reader  should  note 
that  this  routine  (and  some  of  the  other  expansion  routines)  uses  the  end  flag  ('  1 ) 

not  only  to  indicate  the  end  of  a  string,  but  that  it  is  also  used  to  indicate  the  end 
of  the  substring  that  is  being  tested  or  copied  .  In  the  flowiist  the  comments  column 
is  used  to  represent  the  string  during  the  different  parts  of  the  expansion  process. 
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FLOWUST  D.5  #DF1 


Enter:  ^DFl 

DPC  a-  DPC  +1  .  . 

A  change  is  m 

RHP/RMP  ♦  LHP/LMP 

Read  term 

Via  /SLVL 

Replace  following  char,  with  1  (&  1 

D  (  (JO  *(&- 

Set  breakpoint 

Via  /BPSET 

RHP/RMP  ♦  LHP/LMP 

Start  of  term 

Copy  term  to  flag 

00 

Return  breakpoint 

Via  £kTBP 

RHP/RMP  LEHP/LEMP 

End  of  term 

Read  a  character 

Reads  1  * 1 

Replace  with  NULL 

o  ( 0)  Co.  oo* 

Find  flog 

Replace  with  1 )' 

D  (  0)  )  0)* 

Return 
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^DF2-(string  expansion  form2)-This  subroutine  is  called  by  ^DCLAS  to  perform 
the  expansion  given  below. 

Co  (0\)(02),  ncari)  =  0 

O((0\)&2))  =  < 

L(D  (SO)  C?2)  +  D  (02),  n  (?1)  =  x 

It  should  be  remembered  from  the  description  of  #t)CLAS  that  LRHP/LRMP 
points  to  the  last  right  parenthesis  in  this  substring.  This  fact  is  used  to  set  a  flag 
which  indicates  the  end  of  the  second  term.  A  flag  is  also  placed  at  the  end  of 
the  first  term  to  indicate  its  end.  Thus  when  $tTATT  Is  called  the  start  of  the  first 
term  is  given  by  LHP/LMP  and  it  ends  when  the  end  flag  is  found. 

^ETATT  is  the  subroutine  used  to  test  a  substring  to  see  if  it  contains  lambda; 
i.e, ,  it  tests  if  rj  ($)  =  X  .  The  result  is  returned  via  the  variable  ETAVAL.  ETAVAL 
is  set  to  one  if  the  substring  contains  lambda;  otherwise ,  it  is  set  to  zero. 

The  expansion  of  the  form  2  substring  shown  in  Flowlist  D.6, 
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FLOWLIST  D .6  0DF2 


Enter:  0DF2 

DPC  DPC  +  1 

Replace  last  ')'  with  '  (§•  '  0  (  (^1)  (02)  (a 

RHP/RMP  T2HP/T2MP 
Read  first  '(' 

Read  following  term 

Store  following  character  in  XTEMP 


Store  location  in  T2HP/T2MP 

Replace  with  1  (<£  ' 

0  (  (21)  (a  02)  '•(>■ 

Test  r)  (0\) 

no 

IS:  ETAVAL  -  0  B 

Via  #ETATT 

RHP/RMP  T2HP/T2MP 

Does  not  contain  lambd 

Replace  flag  with  ')' 

D  (  (01)  (02)  (a 

Set  breakpoint 

Add  character  stored  in  XTEMP 

D  c  0T1)  )  (02)  ('a 

Return  breakpoint 

Find  end  flag 

Replace  with  NULL 

d  ( cari) )  (02 ) 

Return 

Continued  on  following  page. 
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Contains  lambda 


Continue  *4-  B 
RHP/RMP  T2HP/T2MP 
Replace  flag  with  ')' 

Set  breakpoint 

Add  character  stored  in  XTEMP 

Return  breakpoint 

Find  end  flag 

Set  breakpoint 

Add  'D(' 

RHP/RMP  ♦  T2HP/T2MP 

Copy  second  term  to  flag 
Add  '))' 

Return  breakpoint 
Find  flag 
Replace  with  1  +  1 
RHP/RMP  THP/TMP 
Read  character 
Replace  with  1  (' 

Set  breakpoint 
Add  '  D ' 

Return  breakpoint 


D  (  W)  )  &2 )(d 


i 


D  (  m  )  (tf 2 )  fa  D( 

Start  of  second  term 
Via /COPY 

D  (  #1)  )  $2)  (a-  D  (  (#2) 

D  (  CSTl)  )  (JT2)  (a  D  (  (/)  )  ) 


D  ((/]))  (#2)  +  D  ( (/2) ) ) 

Address  of  first  '  D 1 


(0  ((/!))  (/2)  +  D  ((*2))) 


£DF38 -(string  expansion  forms  3  and  8)-^DF38  is  called  by  X^CLAS  to  expand 
the  strings  given  below. 

p  ($\  +  $2)  =  (D  (?1)  +  U  m  ) 

D  #1  .  ^2)  =  (D  #1)  .  D  (/?)  ) 

The  expansion  of  these  two  forms  are  identical  except  for  the  operator  which  connects 
the  two  terms,  thus  #DF38  can  perform  both  expansions.  In  fact  $t>!::38  does  not 

■  i 

test  for  the  operator,  but  finds  it  and  stores  it  until  needed  without  det&n "ininri 

{ 

the  operator.  The  example  string  shown  in  Flowlist  D*7  arbitrarily  uses  the  operator 

The  redder  should  note  that  In  this  expansion  (and  in  some  of  the  other  ex¬ 
pansion  forms)  that  the  string  is  expanded  from  right  to  left.  This  procedure  is 
dictated  by  the  subroutine  ^BPSET  which  is  used  to  see  the  string  breakpoint.  When 
^BPSET  is  called  several  characters  in  the  cel  Is,  following  the  location  where  the 
breakpoint  is  set,  are  moved.  Thus,  this  character  wi  1 1  no  longer  be  stored  in  the  cells 
whose' locations  were  given  by  THP/TMP,  T2HP/T2MP,  etc. 
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FIOWLIST  D.7  £t)F38 


Enter:  ^DF38 
DPC  DPC  +  1 
RHP/fcMP  -  T2HP/T2MP 
Read  term 

RHP/KMP  -4-  LEHP/LEMP 
Read  a  character 
Set  breakpoint 
Add  •)■ 

Return  breakpoint 

RHP/lRMP  -*»  T2HP/T2MP 

Read  one  character 

►  Read  following  term 
no 

IS:  Following  character 
=  '  +  '  or 

Save  character  in  XTEMP 
Set  breakpoint 
Replace  operator  with  ')' 

Add  character  stored  in  XTEMP 
Add  'D(' 

Return  breakpoint 


D  #1  +  ST2) 

Location  of  first  '  (' 
Finds  end  1 ) 1 
Location  of  end  1 )' 


d  (afi  +  $2) ) 

Reads  '(' 

Find  1  + 1  or  1  . 1 


D  (W2)  ) 

D  0Ti)  +  *2)  ) 

D  (#1)  +  D  (S' 2)  ) 
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Reads  1 D* 


RHP/1RMP  «#.  THPAMP 
Read  a  character 
Set  breakpoint 

Replace  character  with  '(•  (  #1 )  +  D  <*2)  ) 

Add  'D* 

Return  breakpoint  (D  (JTI)  +  D  (#2)  ) 

Return 
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Jit) F4 -(string  expansion  form  4)~J£DF4  is  called  by  #t)CLAS  to  expand  the 
substring  given  below  to  the  form  on  the  right* 

D([fl)  =  I  D(fl] 

The  operation  of  this  expansion  subroutine  is  given  in  Flowlist  D.8 

This  routine  is  based  on  the  use  of  J^SLVL  and  thus  gives  the  reader  a  chance 
to  study  the  operation  of  JfeLVL  without  a  lot  of  other  operations  being  performed* 
When  Jft)F4  is  called  the  location  of  the  start  of  the  inner  term  (  the  1  (#]')  Is 
given  by  LHP/LMP.  This  address  is  transferred  to  the  read  address  (RHP/RMP) 
and  the  term  is  read  via  $$LVL*  J&LVL  returns  with  the  address  of  the  first 
character  (the  1 1  1 )  in  LHP/LMP,  the  address  of  the  last  character  (the  1  )*)  in 
LEHP/LEMP,  and  the  address  of  the  following  character  (the  1  )*)  in  LRHP/LRMP* 
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FLOWLIST  D .8  £t>F4 


Enter:  £t)F4 

D  ((*1  ) 

DPC  +  DPC  +  1 

RHP/RMP  h-  LHP/LMP 

Read  one  term 

Via  XSLVL 

Replace  following  character  with  *  ] 1 

D  ([$)  ) 

RHPAMP  LEHP/LEMP 

Read  character 

Reads  '  ]* 

Replace  with  ,)1 

D  ((/)] 

RHP/RMP  LHP/LMP 

Read  character 

Reads  1  [  1 

Replace  with  1  (' 

D  ((2)1 

RHP/RMP  T2HP/T2MP 

Read  character 

Reads  '  (' 

Replace  with  1  D  1 

DD  (*)1 

RHP/RMP  THPAMP 

Read  character 

Reads  1  D 1 

Replace  with  1  (  1 

ID  (2)1 

Return 

J3lT> F6— (string  expansion  form  6)— This  subroutine  is  used  to  expand  the 


substring  given  below  to  the  form  on  the  right. 


D  (($))*  D  (?) 


When  #bF6  is  called  the  location  of  the  inner  left  parenthesis  is  given 
by  LHP/LMP,  while  the  location  of  the  inner  right  parenthesis  Is  given  by 
LEHP/LEMP.  This  routine  replaces  the  inner  parenthesis  with  NULL's. 
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XDF7-($tring  expansion  form  7)  —This  routine  is  used  to  expand  the  form 


given  below. 


D  (A**  •  .  .A)  =  D  (  (A)**  .  .  .A),  where  A  is  any  member  of 
the  set  of  alpha  characters  (the  literals,  lambda,  and  phi).  If  this  form  does 
not  exist,  no  change  is  made  in  rhe  string.  The  purpose  of  this  expansion  is  to 
prepare  the  string  for  processing  by  J3t)F2  and  #DF1  on  the  following  passes. 

During  the  last  pass  through  ^DCLAS  all  of  the  substrings  are  of  the  form 
*D  (A.  .  .  ,A)‘  thus  only  % DF7  is  called  to  expand  the  substrings.  As  ^DF7 
makes  no  change  in  these  substrings  DPC  is  not  incremented  and  its  value  re¬ 
mains  zero.  A  DPC  of  zero  is  used  to  indicate  to  DCONT  that  the  last  pass  has 
been  made. 

The  operation  of  $t)F7  \$  shown  in  Flowlist  D.9. 
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FLOWLIST  D.9  /DF 7 


Enter:  $DF 7 

RHP/lRMP  -4.  LHPAMP 

Read  character 
no 

-  IS;  It  an  alpha  character 

Store  location  at  LHP/LMP 

Jump  _ _ _ _ 

yes 

*—►15:  Character  =  1  *' _ 

Return 

DPC  ►  DCP  +  1  ^ _ 

RHP/RMP  ►  LHP/LMP 

Read  character 

Store  character  in  XTEMP 

Set  breakpoint 

Replace  character  with  1  (' 

Add  character  stored  in  XTEMP 
Add  ')' 

Return  breakpoint 
Return 


Term  start 


Make  no  changes 
D  (A*.  .  .A) 


Reads  alpha  character 


D  ((*.  .  .A) 

D  (  (A*.  .  .A) 
D  ((A)*.  .  .A) 
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0£1  ATT  ^(string  eta  function  test)~2l:TATT  is  called  by  0DF2  to  calculate  the 
eta  function  of  a  substring.  The  starting  address  of  this  string  is  passed  in  LHP/LMP 
and  the  ond  is  indicated  by  the  end  flag.  The  eta  function  is  used  to  determine  if 
a  string  contains  lambda,  thus  ^ETATT  is  also  called  by  DCONT  to  test  the  final 
string;  giving  the  operator  the  information  needed  to  determine  the  output  of  the 
state  associated  with  this  string. 

The  following  equation  defines  the  eta  function. 

f  X,  if  Xe  0 
0  (?)  “  \ 

U,  if  M* 

From  this  definition  the  following  rules  can  be  obtained. 

D.9  rj  (L)  =  0,  where  L  is  any  literal 

d.  10  n  (X)  *  x 

DJi  n  (0)  =0 

D  J  2  r)  ( 0 *)  55  X,  where  0  is  any  substring 

DJ3  n  (#D  0^2) )  *  n  tfi)  .  n  <?2) 

D  .14  X]  (f  (jTl ,  02)  )  =  f  (n  (^1),  H  (02)  ),  where  f  is  any  Boolean  function. 

From  these  rules  we  can  see  that  the  eta  function  can  be  calculated  as  a 
Boolean  function  if  the  concatenation  operator  is  treated  as  the  Boolean  AND  op¬ 
erator.  This  procedure  was  used  in  the  design  of  $1ETATT.  The  resulting  value  of 
the  eta  function  was  encoded  as  the  Boolean  variable  ETAVAL  with  a  one  representing 
the  condition  where  rj  =  X. 

The  subroutine  #HTATT  is  used  to  Implement  rules  5  and  6,  while  jfeETAT  is 
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called  to  implement  the  first  four  rules*  To  understand  the  logic  of  ^ETATT  it  is 
necessary  to  introduce  the  following  Boolean  identities. 

D.15  1  +  X  =  1 

D .16  0  +  X  =  X 

D.  17  1  ,  X  =  X 

D .  1 8  0  .  X  =  0  ,  where  X  is  either  1  or  0. 

These  identities  are  used  to  determine  if  a  given  term  of  the  substring  needs  to  be 
tested  and  how  the  results  are  combined.  To  perform  this  determination  $ETATT 
was  broken  down  into  four  modes  or  sub -sections. 

AND -Lambda,  OR -Phi  Mode.  This  mode  is  used  to  implement  rules  D .  16 
and  D.17.  These  rules  state  that  if  the  previous  term  had  a  vclue  of  one  and  was 
followed  by  the  AND  operator  (or  concatenation)  or  if  the  previous  term  had  a 
value  of  zero  and  followed  by  the  OR  operator  then  the  results  are  determined 
solely  by  the  present  term.  When  ^ETATT  is  called  the  results  are  determined  solely 
by  the  term  to  be  tested;  thus  #tTATT  starts  In  this  mode.  In  this  mode  the  term 
is  read  by  ^SLVL  to  obtain  the  starting  and  end  addresses  needed  by  ^BETAT. 

After  ^BETAT  ?$  called  it  can  return  one  of  three  values.  If  JTBETAT  executes 
a  standard  return  then  ETAVAL  will  either  be  one  or  zero  depending  on  whether  the 
term  tested  contained,  or  did  not  contain,  lambda.  Under  this  condition  ^ETATT 
goes  to  the  NEXT  mode  which  determines  the  next  mode  of  the  subroutine.  On  the 
other  hand,  a  non-standard  return  is  executed  by  JfBETAT  to  indicate  that  the  term 
was  too  complicated  to  evaluate.  To  evaluate  this  term  it  is  necessary  to  simplify 
the  term  by  “going  down  in  level"  . 
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OR-Lcimbda  Mode.  This  mode  is  used  to  implement  equation  D  .  1 5.  This 
equation  states  that  the  results  are  known  and  that  there  is  no  need  to  test  the 
following  terms  on  the  same  level*  In  this  mode  each  term  is  read  until  the  end  of 
the  level  is  reached,  at  which  time  control  goes  to  the  NEXT  mode. 

AND-Ph?  Mode.  In thismode/an  implementation  of  equation  D.18,  all  terms 
are  read  and  passed  over  until  either  the  end  of  level  is  reached,  or  the  OR  oper¬ 
ator  is  found.  When  the  end  of  the  level  is  reached  control  goes  to  the  NEXT  mode. 
If  the  OR  operator  is  found  the  control  will  go  to  the  OR-Phi  mode. 

NEXT  Mode.  This  mode  is  used  to  test  for  the  end  of  a  string,  find  the  next 
mode,  and  to  Implement  the  operation  of  negation.  When  this  section  is  entered 
the  character  following  the  las*  level  has  been  read  and  it  is  stored  in  the  accum¬ 
ulator  for  testing. 

If  this  character  Is  the  end  flag  then  the  end  of  the  substring  has  been  found 
and  ^TATT  returns  with  the  current  value  of  ETAVAL.  !f  a  right  bracket  is  found 
(this  implies  that  the  corresponding  left  bracket  was  passed  over  in  the  process  of 
“going  down  in  level  “ )  the  present  value  of  ETAVAL  is  negated .  The  finding  of 
the  OR  operator  causes  control  to  go  to  the  OR-Phi  or  the  OR -Lambda  mode  de¬ 
pending  on  whether  ETAVAL  equals  zero  or  one.  The  finding  of  the  AND  operator, 
or  the  concatenation  operator,  causes  control  to  go  to  either  an  AND-Pht  or  the 
AND -Lambda  mode  depending  on  the  value  ETAVAL.  The  finding  of  either  the  OR, 
AND,  or  concatenation  operators  indicate  that  a  new  term  follows.  Thus  control 
is  passed  to  the  appropriate  mode  to  evaluate  this  term. 

If  control  has  not  yet  branched  to  another  mode  then  the  character  (by 
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elimination  we  can  see  that  the  character  is  either  the  or  the  1  ]  1 )  Is  ignored 
and  the  next  character  is  read.  Control  remains  In  the  NEXT  mode  and  this 
process  is  repeated.  •  1 

The  operation  of  ^ETATT  is  shown  in  Flowlist  D.  10„ 


! 
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FLOWLIST  D.  10  ?ETATT 

Enter:  ^lETATT 
RHP/RMP  LHP/LMP 

Regd  a  term  A 

Coll:  #BETAT  , 

yes 

IS:  Return  standard  - - 

RHPAMP  LHP/LMP 

Read  first  character  pf  term 
Jump  A 

RHPAMP  LHP/LMP  ^ - 

Read  the  term 

Jump  -►  B 

— Read  a  term  C 

no 

- — r  IS:  Term  last  in  level 

i 

Jump  -►  B 

— ^  Read  a  term  D 

yes 

IS:  Term  last  in  level  B 

no  ~ 

- IS:  Term  followed  by  '  +  ' 

Jump  -►  A 

Continued  on  following  page. 

i 
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AND  -Lambda ,  OR -Phi 
Mode.  Find  address  for 
/BETAT 

No  decision 
Go  down  in  level 

Finds  character  following 
term 

Find  next  mode 
OR -Lambda  Mode 

Find  next  mode 
AND-Phi  Mode 
Find  next  mode 

OR -Phi  Mode 


i 


Continue  B 

no 

p"  IS:  Term  followed  by  J@  1 
Return 

.nu 

— ►  IS:  Term  followed  by  1  ]  1 

ETAVAL  ^  - ETAVAL 
no 

- IS:  Term  followed  by  '-H 

yes 

IS:  ETAVAL  *  1  C 

Jump  -►  A 

yes 

— ►  IS:  Term  followed  by  _ 

Concatenation 

yes 

IS:  Term  followed  by  1  . 1  — 
Read  next  character 
Jump  B 

Continue  - - - 

yes 

IS:  ETAVAL  =1  ^  A 

Jump  D 


Finds  next  mode 


Done 


OR -Lambda  mode 

OR -Phi  mode 

A  1  ('  ,  1  [  1  ,  or  alpha 
character 


AND -Lambda  mode 
AND -Phi  mode 
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^BETAT-(Basic  eta  function)~This  subroutine  is  called  by  $tTATT  to  implement 
the  first  five  rules  of  the  eta  function  (as  given  in  the  previous  section  of  #ETATT). 

The  start  of  the  term  to  be  tested  is  passed  via  IHP/LMP  while  the  end  of  the  term 
is  given  by  LEHP/LEMP. 

If  the  term  enclosed  in  parentheses  (or  brackets)  it  will  contain  lambda  if 
the  star  operator  follows  the  term  (Rule  D.12).  This  is  indicated  by  executing  a 
standard  return  with  ETAVAL  equal  to  one.  On  the  other  hand,  if  the  term  is  not 
foHowed  by  the  star  operator  then  lower  level  term(s)  which  form  this  term  have  to 
be  tested  to  determine  the  results.  This  is  indicated  by  executing  a  non-standard 
return  to  ^ETATT  which  finds  these  lower  level  term(s). 

If  the  term  is  not  enclosed  in  parentheses  then  it  consists  of  only  alpha  characters 
and  the  star  operator  (as  given  by  the  definition  of  "term 11  in  the  section  on  the  sub¬ 
routine  ^SLVL).  From  the  first  five  rules  of  the  eta  function  we  can  see  the  term 
contains  lambda  if,  and  only  if,  It  consists  of  the  following. 

D.  19  'A' 

D.20  1  A*' 

D  .  21  1  L*‘  ,  where  L  is  any  one  literal 

D .  22  '0*' 

D.23  Or  any  of  the  above  terms  concatenated. 

^BETAT  tests  the  terms  for  the  above  properties  and  executes  a  standard  re¬ 
turn  when  the  end  of  the  term  is  found.  At  this  time  ETAVAL  will  be  equal  to  one 
if  the  term  l>ad  the  above  properties,  otherwise  ETAVAL  will  be  equal  to  zero. 

Flowlist  D .  1  1  shows  the  operation  of  ^BETAT. 
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FLOWLISTD.il  $BETAT 


Enter:  #BETAT 
ETAVAL  0 
RHP/RMP  ^  LHP/LMP 
Read  character 

yes 

IS:  Character  =  1  ('  or  1  [  1 

yes 

— —  IS:  Character  an  alpha  ch^r,  A 

Return 

yes 

■ — ►  IS:  Character  —  1  X1  — — - 1 

Read  next  character 
no 

-  IS:  Character  =  1  * 1 

ETAVAL  ^  1 
Read  next  character 
Jump  A 
— ^  ETAVAL  0 
Return 

ETAVAL  -  1  < - — 

Read  next  character 
no 

-  IS:  Character  = 

Read  next  character 
— ►  Jump  A 


Address  of  1st  character 
of  term 


Term  consists  of  alpha  char, 
and  1  *' 

End  of  term 


A  literal  or  phi  must  be 
fol  lowed  by  a  1  ** 


Does  not  contain  X 


A  1  X  1  may ,  or  may  not  be 
followed  by  1  *' 


Continued  on  following  page 
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Continued  A 

RHP/ftMP  *-  LEHP/LEMP 

Term  is  enclosed  in  paren 
theses  or  brackets 

Read  character 

Reads  last  character  of 
term 

IS:  Character  —  1  ** 

ETAVAL  0 

Contains  lambda 

Return 

Non-standard  return 

No  decision  made 

XMATCL-(strtng  match  and  clear)-This  subroutine  is  called  by  DCONT  to 
perform  the  operation  of  taking  the  derivative  and  simplifying  the  string.  The  * 

final  derivative  operation  is  defined  as, 

* 

r 

D/V(A]A2  -  -  -  An)  =  X  (A2  -  -  -  An),  if  A  =  A1 

< 

0,  if  A  0  A 

C  1 

S^MATCL  performs  the  above  operation  by  comparing  the  derivative  name 
character  (passed  via  CHRMAT)  with  the  first  character  in  the  term  enclosed  by 
the  characters  1  D(*  and  by  1 )'.  During  this  operation  these  enclosing  characters 
are  removed. 

When  the  end  flag  is  found  the  operation  halts  and  four  subroutines  (JTISIM, 

#2SIM,  ^3SIM,  and  $f4SIM)  are  called  to  simplify  the  string.  Simplification  is  an 
iterative  process  similar  to  the  expansion  process  performed  by  JfDCLAS,  jn  other 
words  multiple  passes  are  made  through  the  simplification  routines  until  a  pass  is 
found  during  which  no  changes  were  made.  The  number  of  changes  made  are  tallied 
in  the  variable  DPC ,  It  should  be  noted  that  the  read  address  is  set  to  point  to  the 
start  of  the  string  before  calling,  rather  than  within  the  subroutines. 

The  operation  of  JS'MATCL  is  shown  in  Flowlist  D .  1 2. 
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FLOWLIST  D .  1 2  ?MATCL 


Enter:  JZ'MATCL 

RHPAMP  ♦  0/DRSTRT 

Read  character  A 

no 


-  IS:  Character  =  1  (cl  1 

RHPAMP  +  0/DRSTRT 
Simplify  string 

no 

IS:  DPC  =  0  — - 

Return 

— ^IS:  Character  =  1  D1 
Replace  1  D  *  with  NULL 
Read  next  character 
Replace  with  NULL 
Read  next  character 


no 

-►  A 


no 


IS:  Character  =  CHRMAT 
Replace  with  1  \  1 
Find  next  •)• 

Replace  with  NULL 
Jump  A 


B 


Starting  address 
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